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Abstract 

We propose sequential and parallel algorithms for three different vehicle routing 
problems. We propose a sequential algorithm and a parallel one for the all pairs shortest 
routing problem. For the single vehicle routing problem, when all the locations are on a 
line, and having either deadline time or release time associated with them, we propose an 
0{nMogn) cost, and 0(n) time optimal, parallel algorithms. We propose a sequential 
and two parallel algorithms for the traveling repairman problem, when all the locations 
are on a line, and having no time windows associated with them. 

For the problem of operator coalescing with precedence constraints, we propose an 
AfC algorithm that runs in O(log'^n) time with O(n^logn) cost on a CREW PRAM. 
The problem here is to schedule operations of two jobs, such that the schedule gives 
maximum reward we get, when operations are coalesced. 

We propose sub-cubic cost algorithms for the all pairs longest path (APLP) problem 
and the all pairs longest distance (APLD) problem for directed acyclic graphs. The first 
parallel algorithm solves the APLD problem, for a directed acyclic graph with unit edge 
costs. The second one solves the APLP problem and consequently APLD problem for a 
directed acyclic graph with non negative edge costs. 

We present an //C algorithm for scheduling n unit length tasks, on m identical pro- 
cessors for the case, where the precedence constraint is an interval order. The algorithm 
produces the same schedule as the one produced by the list scheduling algorithm. Here, 
we reduce cost of best known parallel algorithm by a factor of 0(log-^ n) time. 
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Chapter 1 
Introduction 


In recent years, there has been tremendous effort on the part of researchers to develop 
fast and efficient parallel algorithms. Advances in VLSI technology have provided a cost- 
effective means of obtaining increased computational power by way of multiprocessor 
machines. These machines consist of a few processors to many thousands and potentially 
millions of processors. Parallel processing methodology involves employing more than one 
processor to a task and makes them cooperate in various ways to solve computationally 
intensive problem. However, there is a strict upper bound on the processor speeds. This 
bound is dictated by the speed of light and the minimum distance required on VLSI chip 
between two components (so that they do not interact with each other). Hence, the 
hardware technology alone may not satisfy the very increasing demands of computational 
power. Although we can parallelize some operations of sequential algorithms (through 
software), it won't be effective as a number of complex operations pose problems. So, it 
appears that parallel model of solving problems is the only practical approach to exploit 
the (massive) parallelism available from these multi processor machines. 

An algorithm is a sequence of steps (to solve a particular problem). A sequential 
algorithm specifies the actions to be taken by a single processor for solving the problem. 
In a sequential algorithm, only a single instruction is executed at any time. In a parallel 
algorithm, the algorithm is executed by more than one processor simultaneously thereby 
reducing the computation time. We have to extract the parallelism inherent in the 
problem to obtain good parallel algorithms. In this thesis, we have tried to develop some 
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efficient parallel algorithms for some optimization problems. In Section 1.1, we give a 
brief introduction to the problems chosen for this thesis. In Section 1.2, we give a brief 
introduction to the deployed model of computation. 

1.1 ProbJems 

In this section, we introduce the problems that have been considered under this thesis. 
Following problems have been considered. 

1. Vehicle routing probienns 

Consider a complete directed graph in which each arc has a given length. There is a set 
of jobs, each job i located at some node of the graph, with an associated processing time 
or handling time hi. Execution of job i has to start within a pre-specified time window 
[rjjdi] . We have a vehicle that can move on the arcs of the graph, at unit speed, and 
that has to execute all jobs within their respective time windows. We consider three 
different problems in this class of vehicle routing problems. The first problem is to find 
minimum time cost routes between all pairs of nodes in a network; This is called the 
all pairs shortest routing(APSR) problem. The second problem is Single vehicle routing 
problem in which a vehicle services all locations in a network in minimum amount of time. 
The general problem is jV'P-complete but 0{n^) time algorithms have been developed 
when the underlying network is line and there is no time cost in servicing a location, and 
ail the time windows are unbounded at either their lower or upper end. In case, time 
windows are unbounded at lower end we call dSVRPTW-line problem, and if they are 
unbounded at upper end we call rSVRPTW-line problem. We show that under the same 
conditions, we can reduce this problem to the shortest path problem in layered graph 
and hence obtain a parallel algorithm. We finally consider Traveling repairman problem 
which is similar to Single vehicle routing problem except we need to minimize the sum 
of waiting of all locations. The general problem in this case is also A P-complete. But, 
there is an 0{n^) time algorithm, when the network is line, handling times are zero and 
there are no time bounds associated with locations. We call this TRP-line problem. 

2. Coalescing operations in real-time systems 

In real-time systems, an object usually makes scheduling decisions in order to maximize 
the system performance. There are many ways to make scheduling decisions. Operation 
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Coalescence is one of them [9|. The idea is that some objects may be able to handle 
several distinct requests at the same time. For example, suppose a stack object is 
implemented in a system and the public or primitive operations defined for the stack 
object are; push. pop. new and top. We may provide a coalesced operation double 
push, which takes two elements and pushes both of them onto the stack at the same 
time. Whenever two consecutive push operations are found in front of the request queue 
for the stack object, the object scheduler can invoke the coalesced operation double- 
push instead. We present an M'C algorithm for coalescing operations with precedence 
constraints in real-time systems when the number of jobs is two. 

3. All pairs longest path problem 

The all pairs longest path (APLP) problem is to compute the longest path between all 
pairs of vertices in a directed graph. The all pairs longest distance (APLD) problem is 
defined similarly, the word “path” is replaced by “distance". We show that the all pairs 
shortest path algorithms proposed by Takaoka [40] can be easily modified to find all pairs 
longest paths in directed acyclic graphs. The first parallel algorithm solves the APLD 
problem for a directed acyclic graph with unit edge costs. The second parallel algorithm 
solves the APLP problem, and consequently the APLD problem, for a directed acyclic 
graph with non negative costs (real numbers) in O(log^n) time with o(n^) sub-cubic 
cost. The results of this problem directly affect the complexities of scheduling interval 
ordered tasks problem described next. 

4. Scheduling interval ordered tasks 

The problem of scheduling tasks is to schedule a set of tasks, to a set of processors 
satisfying the precedence constraint. In this thesis, we consider the problem of scheduling 
n unit time length tasks on m identical processors or machines for the case, where the 
precedence constraint is an interval order. The precedence constraint is represented by a 
partial order. An interval order is a partial order, whose incomparability graph is a chordal 
graph [32]. (Refer Chapter 2 for definitions of incomparability and chordal graphs). We 
present an WC algorithm for this problem. Our algorithm constructs the same schedule 
as the one produced by the sequential algorithm. 
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1.2 Model of Computation 

To analyze algorithms, we need an abstract model of computation on which all the 
cost and space requirements of the algorithms of the problem can be expressed and 
compared against existing ones for answering question of efficiency. In this thesis, we 
use the Parallel Random Access Machine (PRAM) as the model of computation. PRAM 
is the parallel analogue of the unit cost sequential random access machine (RAM). In 
this model there are N processors numbered from 0 to iV — 1. All the processors have 
access to a memory consisting of cells numbered from 0. Further, the processors are 
assumed to be aware of their number which is also called their id. Depending on the way, 
simultaneous access to a same memory location by more than one processor is allowed, 
the PRAM family is classified into 3 classes as described below. 

• Exclusive Read Exclusive Write (EREW): In this model, at any particular time 
instant, no more than one processor is allowed to either read from or write into the 
same memory location. The algorithm designer should write algorithms in such a 
way that, conflicts never occur. 

• Concurrent Read Exclusive Write (CREW): In this model, simultaneous read 
of a memory location by more than one processor is allowed. But, simultaneous 
write to the same memory location is forbidden. 

• Concurrent Read Concurrent Write (CRCW): This model allows both simulta- 
neous reading as well as writing a memory location by more than one processor. Ac- 
cording to the resolution method of concurrent write, this is further sub-classified 
into the following models. 

— COMMON: All processors writing to a same location should write the same 
value. 

— PRIORITY: The smallest numbered processor succeeds in the write. 

- ARBITRARY: One processor is guaranteed to succeed but no commitment 
is made as to which processor succeeds. 

- MINIMUM: The processor having minimum value succeeds in the write. 

A model is said to self-simulating, if an algorithm which takes 0{t) time with p 
processors, can also be implemented on that model in 0{rt) time with ^ processors (r > 
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1) processors. EREW, CREW, common-CRCW models are self-simulating models[26]. 

A parallel algorithm that runs in time 0(f{/?)) using p{n) processors is said to have 
a cost of c(n) = 0{t{7i)p[n)). If this cost equals the cost of the best known sequential 
algorithm, then the parallel algorithm is said to be work optimal [26], 

1.3 Organization of the Thesis 

The rest of the thesis is organized as follows. In Chapter 2, we review some algorithmic 
techniques, data structures, basic definitions, notations and complexity classes etc., 
which are are used in later chapters. 

In Chapter 3, we discuss the algorithms for vehicle routing problems. We present 
serial algorithms for the ail pairs shortest routing problem, the TRP-line problem, and 
parallel algorithms for the all pairs shortest routing problem, the dSVRPTW-line problem, 
the rSVRPTW-line problem, and the TRP-line problem. In Chapter 4, we present parallel 
algorithms for coalescing operations with precedence constraints in real-time system. 

In Chapter 5, we give parallel algorithms for finding all pairs longest path in directed 
acyclic graphs. In Chapter 6, we present a parallel algorithm for scheduling interval 
ordered tasks. 

In Chapter 7, we offer some concluding remarks, and state some of the possible 
extensions to our work. 
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Chapter 2 
Preliminaries 


In this chapter, we describe some basic concepts, those we frequently use in subsequent 
chapters. In section 2.1, -we review some basic parallel algorithmic techniques, and in 
Section 2.2 we give some basic definitions. We explain the shortest path algorithm 
for Layered directed acyclic graph in Section 2.3, and Fibonacci heap data structure 
in Section 2.4. We give an introduction to asymptotic notation and some complexity 
classes of problems in Section 2.5. 

2.1 Basic Parallel algorithmic techniques 

Prefix Computation. Consider the sequence of n elements ;i'n} from a 

set S, on which a binary associative operator ‘(g)' is defined. The problem of prefix 
computation is defined as finding n "sums" 5, = rcj ig) rca — I'i- 1 < i < n. 

This problem is solved by organizing the computation in the form of a balanced 
binary tree. Prefix computation problem can be solved in 0{\ogii.) time, with linear 
work 0{n) on a EREW PRAM [28]. Prefix computation is the most frequently used 
subroutine in parallel algorithms. We use the following variations of prefix computation 
in the later chapters. 

• Prefix sum: Considering the binary associative operator to be the usual summation 
operator {+). we get the prefix sum. The sum of n numbers is given by Sn- 

• Prefix maxima: Considering the binary associative operator to be the usual max- 
imum operator we get the prefix maxima. The maximum of ri numbers is given 
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by s„. 

Generalized Prefix Computation 

Definition 1 [37] Let /[;nj and be given sequences of elements for integers 1 < 
m < n. Again, let us suppose, there is an arbitrary binary associative operator '®' 
defined on the elements /• ], and the elements )j[ ] can be compared by a linear order 
'< Then, the problem is to compute the sequence of general prefixes: 

^[m] = f[ji] ® /[ja] ® . . . S* f[jk] where j\ < J 2 < ... < jk snd {ji.j-y — ,jk} is set 
of indices j < m for which y[j] '< ' (/[77z], where in is each index from 1 to n. 
Generalized prefix computation can be done in O(logn) time with n processors on a 
CREW PRAM [37]. 

Merging. Given two sorted arrays A[Q..n - 1] and 5[0..n- 1], the problem is to merge 
them into a single array. If comparison is the only allowed operation on the elements, 
then merging can be done in O(loglogn) time using processors on a CREW 

PRAM [26]. 

Nearest larger larger problem. Given an integer array A[l..n], the problem is to find 
the nearest larger processors on a CREW PRAM [6]. 

2.2 Basic Graph Definitions 

Definition 2 [32] The incomparability graph of a partial order P = 0 • --i) is a graph 
G — (K jB), where {v,u) € E, iff {v,u) and {u. v) ^ A. The complements of incom- 
parability graphs have a transitive orientation, i.e. an assignment of direction to either 
of the edges such that resulting directed graph is a partial order. 

Definition 3 [20] A chordal graph is one in which, any circuith'i > 4, pos- 

sesses a chord, i.e. an edge {vi,Vj) with j l(mod k). 

Definition 4 [20] An interval graph G = (r. E) is one whose nodes are closed intervals 
on the real line, and (v,u) € E iff v n u ^ o. It is well-known that interval graphs are 
chordal. 



2.3 Shortest path algorithm for Layered Directed Acyclic Graph 

An n Layered directed acyclic graph (LDAG) is a graph with vertices lying on layers: the 
edges will be from nodes at layer i to i +• 1 only. A simple example of layered graph is 
grid graph as shown in Figure 1. 




n4 n5 n6 n7 nS n9 


Figure 1: An example of Layered graph 

The LDAG shown in Figure 1(a) can be re-drawn as in Figure 1(b). The graph has 
I rows, (n — 1) columns, and n — 1 layers and maximum number of nodes in any layer 
is b s= 7 nin{l, n-l). Edges from nodes at layer i lead to nodes at layer i + 1 only. If 
adjacency list is used for storing the edges, the edges to next level can be stored in an 
array of length 2b. An element {i,j) will be in layer {i + j), and it is in position in 
array if 1 < i + j < L or in position I — i, otherwise. 

For finding the shortest path between s and t, the vertices in alternate layers are 
removed iteratively. The number of layers in each iteration reduces by half. Thus, the 
algorithm takes O(logn) iterations. If there is an edge from nodes ‘x at layer i — 1 
to node ‘y’ at layer i, and if there is another edge from node ‘ij to node ‘ 2 ’ at layer 
i + 1, then the two edges can be replaced by a single edge from node 'x' to node ‘ 2 ’ 
with new weight as sum of the weights of original edges. Saxena[35] has shown that, 
we can find the shortest path in 0(logn*Iog6) time, with 0 { 7 )b~f \ogb) processors on 
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3 CREW PRAM, 3nd in 0(logTi* log log 6) time with 0(77.6'^/ log log 6) processors on a 
common-CRCW PRAM. 

2.4 Fibonacci heaps 

In this section, we describe the Fibonacci heap(F-heap) data structure described by 
Fredman and Trajan [17|. 

Definition 5 Heap is an abstract data structure consisting of set of items, each with 
a real-valued key. 

It is associated with the following operations operations defined on it. 

MakeHeap(): return a new empty heap. 

lnsert(i, /i): insert a new item i, with a predefined key into heap h. 

FindMin(/i): return an item of minimum key in heap li. 

DeleteMin(/i): delete, and return an item of minimum key from heap h. 

Meld(Ai, /la): return the heap formed, by taking the union of item disjoint 

heaps hi,h- 2 . This operation destroys both hi and /la. 
DecreaseKey((5, i, h): decrease key of item i in heap h, by subtracting the non-negative 

real number <5. This operation assumes that, the position of 
i in /i is known. 

Delete(t, ^): delete arbitrary item i from h. This operation assumes that, the 

position of i in A is known. 

Definition 6 Fibonacci Heap is a collection of item-disjoint heap ordered trees. Rank(a:) 
denotes the number of children of a node x. In the Data Structure F-heap, each node 
has a field to store its rank. Moreover, a node contains a bit, to mark/unmark that 
node. All nodes are initially unmarked. 

Each node has a pointer to its parent, and a pointer to one of its children. Children 
of any node are doubly linked in a circular linked list. All roots of trees are also doubly 
linked in a circular linked list. There is a pointer to the root containing the item of 
minimum value. F-heap always has to maintain two invariants: 

• No two roots can have the same rank 

• No non-root can loose two children 


9 



To maintain the first invariant we keep an array root[ ] of pointers: root[i] points to root 
of rank[i], if one exists, otherwise it points to null. Basic heap operations with F-heap 
are as follows; 

MakeHeap(): return a pointer to null. 

FindlVlin(/i): return the item in the minimum node. 

Meld(/ii, /ij)- combine root lists of hi, li -2 in a single list, and return the root 

containing the smaller key. Carry out linking step described later. 

lnsert(i, A): create a new heap containing item i, and replace h by melding h with 

the new heap. Carry out linking step described later. 

Delete(t, /i); if item is not the minimum node, then 

a) Form a new list of roots by concatenating the list of children (a:) 
with the original list of roots. Make rn;?^(x-) equal to zero; 
effectively, remove all its children form itself. Carry out linking 
step described later. 

b) Cal! the procedure for cutting the edge, joining x to its parent. 

c) Remove x from the list of roots, and destroy node x. 

DecreaseKey((J, f, h): decrease key of item i, if i was the root, check if i is now the 

minimum node or not. Otherwise, if heap order is violated, call the 

procedure for cutting the edge, joining i to its parent 

Procedure for cutting the edge, joining x to its parent. Insert x in the doubly 

connected linked list of roots, and check for cascading cuts as follows: 

• if parent{x) is unmarked, then mark it and decrease its rank by one. 

• if parent{x) is marked, then first decrease its rank by one, and then recursively 
call the procedure for cutting the edge joining parerzt(x) to parent\parent{x)]. 

Thus, if a node looses two of its children through cuts, we cut the edge joining, x 
to its parent (cascading cut), making x a nev/ node. All these operations can be done 
in 0(1) amortized timeT DeleteMin(A) is the only operation, that cannot be done 
trivially in 0(1) time. It takes O(logn) amortized timeT The algorithm for this is as 
follows: 

1. Remove the minimum node say x, from h. 

^Amortized complexity of an operation is 0{g{n)) if for a sequence k (sufficiently large, k>n) 
operations,. the total time required by these operarions is 0{kg{n)) [2]. 
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2. Concatenate the lists of children(;c) with the list of roots of h other than x, and 
repeat the Linking step, till it no longer applies. 

3. Form a list of remaining roots, in the process finding minimum node. 

4. Save the value stored in item .c, destroy item x, and return the value. 

Linking step. Find any two trees with the same rank and link them. If x is made 

child of y, then unmark(x). 

2.5 Analyzing Algorithms 

2.5.1 Order Notation 

In this section, we describe various notations we will be using to express time, processor 

and work complexities of algorithms [14]. 

1. The notation / = 0(y) {read as 7 is theta of y”) is used, iff there are positive 
constants Ci,C 2 . and a positive integer N such that, |Ci * g{n)\ < l/(n)j < 
\C 2 * g{n)\, for all n> N. 

2. The notation / = 0(y) (read as "/ is oh of g") is used, iff there is a positive 
constant C and a positive integer N such that, 0 < |/(n)l < 1C * y(n)|, for all 
n> N. 

3. The notation / = Q(y) (read as 7 is omega of y") is used, iff there is a positive 
constant C, and a positive integer N such that, 0 < jC * y(;?)| < !/(n)j, for all 
n> N. 

4. The notation / = o(y) (read as 7 is little-oh of y") is used, iff there is a positive 
constant C, and a positive integer N such that, 0 < |/(n)i < |C * y(n)| for all 
n>N. 

5. The notation / = w{y) (read as 7 's little-omega of y") is used, iff there is 
positive constant C, and a positive integer N such that, 0 < |C*y{n)( < l/(n)l, 
for all n> N. 

2.5.2 Complexity classes 

Whenever a new problem is confronted, the first question that arises is Can it be solved 

in polynomial time sequentially?”. If the answer is yes, then the second question will 
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be "Can it be solved in poly logarithmic time or constant time in parallel?”. To answer 
these questions, the algorithm designer needs to have some basic understanding about 
the different classes of problems. In this section, we define some complexity classes of 
problems. 

Definition 7 [4] Class J^C. A problem is said to be in class .VC, if there exists an 
algorithm to solve the problem in poly logarithmic time on a PRAM, using polynomial 
number of processors. 

Definition 8 [4] Class Random JVC. A problem is said to be in class random JVC (TUV C), 
if there exists a random algorithm to solve the problem in poly logarithmic time on a 
PRAM, using polynomial number of processors. 

Definition 9 (4] Class V. A problem is said to be in class V, if there exists a sequential 
algorithm to solve the problem in polynomial time. 

Definition 10 [4] Class 7^-complete. A problem is said to be ‘P -complete, if it satisfies 
the following conditions. 

1. It is in Class V. 

2. It can be solved in poly logarithmic time on a PRAM, iff all P -complete problems 
can be solved in poly logarithmic time on a PRAM. 

Definition 11 [24] Class JVP. A problem is said to be in class .VP, if there exists a 
non deterministic sequential algorithm for solving the problem in polynomial time. 
Definition 12 [24] Class ^T^-complete. A problem is said to be .VP-complete, if it 
satisfies the following conditions. 

1. It is in Class JVP. 

2. It is solved in polynomial time, iff all the jVP-complete problems are solved in 
polynomial time sequentially. 

Definition 13 [24] Class jVP-hard. A problem is said to be .VP-hard, if it is solved 
in polynomial time then, all JVP-complete problems are solved in polynomial time. 

We can easily observe that all W'P-complete problems are jVP-hard but. all JVP-hard 
problems are not A/^P-complete. Garey and Johnson[18] provide a comprehensive treat- 
ment to JVP-complete, .-V'T’-hard problems. 



Chapter 3 

Parallel Algorithms For Vehicle 
Routing Problems 


3.1 Introduction 

Vehicle routing problems involve navigation of one or more vehicles through a network 
of locations with each location (or node) being serviced. Locations are associated with 
handling times as well as time windows during which they are active. The handling time 
is the time it takes to service that node. The time window can be specified as a release 
time - deadline pair where these are the earliest and latest times that the node can be 
serviced. The arcs connecting locations have time costs associated with them. A vehicle 
navigating through a network travels through a sequence of locations; such a sequence is 
called route. The length of route is the sum of handling times of each node on the route 
and the travel times along each edge on the route. However, if the vehicle arrives at a 
node in the route before the release time of that node, it must wait until the release time 
has elapsed before proceeding through the node; these wait times are added to routes 
length. If the vehicle arrives at a node past its deadline time the route is not feasible. 

In this chapter, we develop parallel algorithms for all paivs shortest rovtingl^APSK) , 
Single vehicle routing pro6iem(SVRP) and Traveling repairman problem{JRP). In the 
first problem, the all pairs shortest routing problem, we compute the shortest route 
between all pairs of locations. Our algorithm runs in time O(log * n) time on a CREW 
PRAM using O processors. The best known previous algorithm given by Gupta 
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et al. [21] runs in 0(!og^ n) time using 0{n^) processors on a CREW PRAM. 

A network is said to be a line, if all locations in the network lie on a line. In the second 
problem, the single vehicle routing problem we compute the shortest route involving 
all locations starting from a particular locations called Depot. Here the network is line, 
and all the locations will have either release time or deadline time. Our algorithm for 
these problems take 0(log®n) time using 0 (i^^) processors on a CREW PRAM. The 
best known previous algorithms given by Gupta et al. run in O(log-* n) time with 0(n®) 
processors on a CREW PRAM. In Gupta et al. [21] they have wrongly stated that their 
algorithms for the APSR problem and the SVRP problems run in (;(log-/!) time^. 

The third problem, the traveling repairman problem is similar to the second one 
but, here we try to find a route involving ail locations and the sum of waiting times 
at ail locations is to be minimum. In this problem, the network is a line and locations 
don’t have any time windows associated with them. We show that this problem can be 
reduced to shortest path algorithm in layered graph and give an O(log"/r) time parallel 
algorithm with 0 ({—) processors on a CREW PRAM. 

These problems arise in the service sector in applications such as garbage collection 
and postal delivery, in the commercial sector in the transportation of goods through 
road and rail, and in the industrial sector in material handling systems in manufacturing. 
In addition, these problems arise in experimental applications such as automatic vehicle 
routing and robot arm movement[21]. 

The problem of navigating a vehicle in a city arises in a variety of guises, each 
street has a time cost for travel on the street, each intersection has an expected delay 
and some intersections or streets might be open only at certain times, vehicle routing 
problem requires a vehicle to travel as quickly as possible from one location to another 
respecting these timing constrains. Alternatively, the vehicle may be required to service 
a number of locations in the city in the minimum time, again respecting the timing 
constraints [21]. 

The Vehicle routing problem (VRP) involves the design of a set of minimum cost 
routes, originating and terminating at central depot, for a fleet of vehicles which services 
a set of customers with known demands [36]. Each customer is serviced exactly once 

*They have stated that composition of two tables or finding minimum of twu Tables requires 
0(1) time. But, these operations require O(logn) time. 
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and furthermore, all the customers must be assigned to vehicles such that the vehicle 
capacities are not exceeded. Bodin et al. [8] provide a comprehensive survey of the VRP 
and its variations which also describes the many practical occurrences of these problems. 

In VRP with time windows (VRPTW), the above issues have to be dealt with under 
the added complexity of time windows. Time windows specify the deadlines and earliest 
service times of customer. The service of customer can begin within the time window 
defined: service involving pickup and/or delivery of goods. These time windows can be 
soft or hard. In case of hard windows the service has to start within the time window. 
However, in case of soft windows the service time can violate time window with some 
penalty. Solomon et al. [36] provide a survey of time constrained routing and scheduling 
problems. 



Zero processing times 

General processing times 

No release times 

Trivial 

Trivial 

or deadlines 



Release times only 

0(n^)[33] 

W’'P-complete[41] 

Deadlines only 

0 n'^) 41 

? 

General time windows 

Strongly //'P-compl€te[41] Strongly A' 'P-connplete[19] 


Table 1: The complexity of special cases of SVRPTW-line(n is the number of jobs) 

SVRP is a case of VRP when then there is only one vehicle and it is A'^P-complete 
even if the inter point distance metric is restricted to Euclidean [31]. Introducing time 
constraints on the problem (SVRPTW) can only make it harder [34]. There are some 
polynomial time algorithms when we restrict underlying network to a straight line and 
the vehicle is incapacitated. Psaraftis et a/.[33] study this variant where nodes have 
zero handling times and ail time windows have only release time constraints. Tsitsik- 
lis [41] studies a similar problem except the time window for each node has only a 
deadline constraint. We call these the SVRPTW-line problem with release (rSVRPTW) 
and the SVRPTW-line problem with deadline (dSVRPTW) respectively. In both cases, 
the authors provide an Oin^) time algorithm for their respective problems based on a 
dynamic-programming formulation. Tsitsiklis in addition shows that the SVRPTW-line 
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problem with general time windows but zero handling times is ^VP-hard. He also shows 
that SVRPTW-line with a release time constraint with non-zero handling time \s 
hard. Karuno et al. [27], consider the case in which the underlying network is a tree, 
time windows have only release time constraints and each node is allowed a non-zero 
handling time. They show that the problem is jVP-hard and provide an approximation 
algorithm with a performance ratio of two. 



Zero processing times 

General processing times 

No release times 

0(n2)[l] 

? 

or deadlines 
Release times only 

7 

strongly .\*P-comp!ete[29] 

Deadlines only 

A/‘P-complete[l] 

j\/‘‘P-complete[41] 


pseudo polynomial 

j\fP-complete[41] 

General time windows 

Strongly W7^-complete[41] 

Strongly .\'P-complete[19] 


Table 2: The complexity of special cases of TRPTW-line(n is the number of jobs) 

Traveling repairman problem(TRP) is a variation of the well known Traveling sales- 
man problem, in which instead of minimizing the total completion time for salesman 
tour, one tries to minimize the sum of the waiting times of locations or customers [1]. 
TRP captures the waiting costs of a service system from the customers point of view and 
it can be used to model numerous types of service systems. While the general problem 
is //7^-complete [1], some progress has been made when the network is a straight line. 
Afrati et al [1] have given a O(n^) time algorithm when the handling time of location 
is zero and there are no time bounds associated with locations[41|. We call this as 
TRP-line problem. 

In Section 3.2, we give definitions and formal description of problems considered. 
Section 3.3 provides a description and analysis of the parallel algorithm for APSR prob- 
lem. In Section 3.4, we describe algorithms for rSVRPTW-line and dSVRPTW-line 
problems. In Section 3.5, we give an algorithm for TRP-line problem. 
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3.2 Preliminaries 


The underlying network can be represented as a graph G{V, E), where the set of nodes 

V are locations, and the set of edges E are links between locations. Each edge (u,u) 
has weight t{u,v) associated with, the time required for the vehicle to traverses this 
edge called travel time. Each node v will have handling tim.e h{r) that denotes the 
time required for vehicle to service the node and a time window associated with it. The 
vehicle can pass through the node on the way to some other node on the route or actually 
handle the node during its time window. The time window can be specified as release 
time r(v), and deadline time d(v) pair, where these are the earliest and latest times 
that the node can be serviced. Further, we assume 0 < r{v) < d{t:) < oc for ail nodes, 

V and that all weights are positive, and integral. The closed interval [r(ri.(/{y)] denote 
the time window of v. 

A network G is called line, if its nodes can be ordered as (•>. ",i such that 

there is an edge from Vi to Vi+i, for 1 < i < a, and there are no other edges. The 
travel time definition is extended to be the length of shortest path from u to v. A route 
is a set of nodes serviced in the order specified in the sequence. A route is feasible, if 
its cost is defined: otherwise it is infeasible. The cost of route at a particular starting 
time is defined as, if the vehicle can traverse through the nodes (in the route) with every 
node being serviced before its deadline time. The cost of route can be formally defined 
as follows. 

Definition 14 [21] A route is a sequence of nodes vq,Vi, . . . , c.s such that if i ^ j then 
Vi ^ Vj. The cost of the route Ct{vq,Vi, — u,), is a function of the starting time T, 
and is defined inductively: 

1. If T > d{vo), then Ct{vq) is undefined; 

2. If T < r{vo) then Ct(vo) = ’'(^o) + 

3. If r(t;o) < T < d(ro), then Ct{vq) —T + h{vo)', 

4. If Ct{vo, Vi, . . .,Vi), is undefined then Ct{vo, Vi,... , Ui+i), is undefined; 

5. IfC'r(uo,^;i,. > cf(ui+i). then CrCvc'^i- • • • : t’i+i ) is undefined: 

6. If Ct{vo,Vi, . . . ,Vi) + t{vi,Vi+i) < r{vi+i), then CTivo,Vi . . . . /r,-, i = r{vi+i) + 
h{vi+i); 

7. If ^ T)j'{v(i, I'l, . . . }t?i) "F ^ then 
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^t(uo» Vi, , tJj+i) = Ox{vq, Vi, ... , i',) + t{vi, Uj+i) + /?( ■"■-i )• 

The cost of route is optimal, if there is no route with smaller cost from ro to Vn. It 
is covering, if every node of G appears in the route. The problems considered in this 
chapter are defined as follows. 

Definition 15 The all pairs shortest routing problem(04PSf?) determines, for each 
pair of nodes u and v in G, the optimal route between u and v, for each possible starting 
time of the vehicle, starting from node u, and servicing all nodes in the route within 
their respective time windows. 

Definition 16 T/re single vehicle routing problem with time wim low ruust i-;Vir.:s(SVRPTW) 
on G finds the optimal covering route ofG that starts at depot(d{G)) at time 0, and 
ends at T{G), where T{G) can be any node. 

Lemma 1 [21] SVRP is jTV-hard even for h{v),r{v) = 0, and il{r] = x. for every 
v€V{G). 

The proof of this lemma follows by a reduction from TSP. In this chapter, we only 
consider SVRPTW problem for which h{v) = 0 with underlying network as line. Further, 
we discuss only SVRPTW-line problems having only release times (i.e. /■((;) = 0 V-u), 
and SVRPTW-line problems having deadlines (i.e. h{v) = oc tc) These are called 
rSVRPTW-line, and dSVRPTW-line respectively. 

Definition 17 The traveling repairman problemfTRP^ on G finds the covering route 
ofG that starts at depot(S{G)) at time 0, and ends at T(G) having minimum sum of 
waiting times of nodes The waiting time of node is the difference between release time, 
and the actual time at which vehicle services the node. 

In this chapter, we discuss TRP-line problem for which all nodes in the network have 
zero handling time, and there is no time windows associated with them. 

3.3 All pairs shortest routing problem 

The handling time term can be eliminated by simply adding h{v) of node /■ to the travel 
time on each edge out of v and assume that the networks do not have handling times 
[21]. In Section 3.3.1, we describe a sequential algorithm for the problem of finding a 
shortest route from a node S' to a node T using Fibonacci heap data structure given 
by Fredman and Trajan [17]. This in-fact finds the shortest route from node 5 to all 
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other nodes in G. This is called S-T routing problem [21], The algorithm described has 
0(m + nlogn) time complexity, where m is the number of edges and u is the number 
of nodes in the network. This is an improvement over the 0{n~) time algorithm given 
by Gupta et al. [21], In Section 3.3.2, we give an parallel algorithm for APSR problem. 

3.3.1 Sequential algorithm for the APSR problem 

Let G{V, E) be a network with time windows specified at each node a G \ ' and travel 
times specified on edge {u,v) € E. Let n be number of nodes and id be the number of 
edges. Gupta et al. have described an 0{n-) algorithm for this problem. Our algorithm 
proceeds in the manner analogous to that of Fibonacci heap(F-heap) implementation of 
Dijkstra’s algorithm. 

1. Un-label all the vertices v G G. 

2. Cost function T is defined as follows 

The cost function for S is T{S) = 0, and for all other vertices a is T{u) = 
oo, prev{u) = S 

3. for ail neighbors v of S, we update T as follows 
T{v) = max{r(u), T{S) + t{S, u)} 

if T{S) + t{S, v) > d{v) then T{v) = dc 

4. while{3t; G V which is un-labeled) 

(a) Select an un-labeled vertex v, having minimum T(v), Mark it. 

(b) For each edge (v,w), T{w) = mm{T’(t/;),maoc{r(u'),r(/.’) -f- r(u,'u;)}}. If 
T{v) +T{v,w) > d{w) then T{il') = oo. 

Lemma 2 Sequential algorithm for S-T routing problem takes 0\ in -i- n lugn) time. 
Proof: If all un-labeled vertices are kept in a F-heap then Step 4(a) requires a delete 
minimum operation, and Step 4(b) require a decrease key operation; in that case we 
also make prev{w) = v. Moreover, for every edge (u, t^), Step 4(b) is performed exactly 
once. Thus Step 4(b) will be performed exactly m times. Moreover, Step 4(a) will be 
performed exactly once for each vertex. Thus, we have n minimum deletion operations 
and and m decrease key operations on a F-heap data structure. A delete minimum 
operation takes O(logn) amortized time and a decrease key operation requires 0(1) 
amortized time [17]. Thus the total time taken by our algorithm is 0{m -f /dogn). I 
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Theorem 1 Sequential algorithm for APS R problem takes 0{nm + ir \w\xii) time. 
Proof: The proof of the theorem follows from the fact that, APSR problem can be 
solved by performing S-T routing problem once for each vertex c in G i.=. ii times. I 

3.3.2 Parallel Algorithm for the APSR problem 

For the parallel case, we must keep track of significantly more informatic;- at each step. 
Suppose we are trying to compute an optimal S-T route for a pair of noces 5 and T of 
G. Let 5, Ui,. .. ,Vk,T be such an optimal S-T route. Then, by starting at rt and S 
simultaneously, we can look for optimal paths from S to starting at time 0 and from 
Vk to T starting at time C(S,Vi , . . . , Vk). We can now recursively solve '■'-/'i and the 
Vk-T routing problems. 

There are two reasons why we cannot directly apply this technique. Frst the routes 

S-Vk and Vk-T cannot be computed independently - because of time-winccws. we need 
2 2 

C(5, Ui, . . . jVfc) to compute the optimal v^-T route. Second, we do not know which 
2 • 

node of G is Vk. Trying all possibilities may result in an algorithm requiring Oi.rz'“®”) 
processors. We compute the optimal route as a function of time to address the first 
issue, and use a variant of pointer doubling to handle the second [21]. 

3.3.2.1 Cost Vectors 

A function is associated with every pair iu,v) E E{G); TuAl) denotes the cost of 
an optimal u-v route starting at time t. 

For the S-T routing problem, the optimal cost of Vk-T route is cc.mputed as a 
function of Ts,vk and in parallel the cost of the optimal S-Vk route is also computed. 
By evaluating the function at the actual cost, we have the cost of optimal S-T route. 
Definition 18 [21] For every {u, v) € E{G’ and t > 0, letVu.vit) denc:e an optimal 
cost route from u to v when the vehicle starts from u at time t. The cos: vector of the 
optimal route from u to v is a function T^^v '■ N —¥ N such that Tu.vif • ^de cost of 

VuAi)- 

The cost vector is monotonic in t since the vehicle must wait at nodes whose time 
windows are not yet open. Figure 2 shows an example of cost function. 
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Figure 2: An example Graph G with time windows and travel times. Three tables 
specify the cost vector as well as optimal routes for each row 


Let P be an optimal route in G during all times in a time interval I = [a, b]. Let 
W > 0 be the sum of the waiting times, when a vehicle follows V starting from u at 
time a. \f b> a + W,T makes a transition at time a + W: before this time, a vehicle 
following V is forced to wait at one or more nodes but after this time there are no waits. 
V is called transition route and a 4- W its transition time. In the transition route V, 
there is at least one one node w such that if the vehicle starts at any time t <a + W, 
it must wait at w and it is called bottle-neck node of T and bn{T) denotes the set of 
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such nodes [21]. 

Lemma 3 [21] Suppose Vi and V 2 sre two transition routes from u to v that are 
optimal during time intervals Xi and I 2 respectively. Ifli occurs strictly before Xo then 
lm{Vi) n bn{T2) = 0 

The cost vector Tu,v can be represented by a table satisfying the following properties 

[ 21 ]: 

1. each row r of the table consists of interval(r) and cost{r) 

2. intervalir) is of the form [a, 6] or [o.dc) where 0 < a < b < u and b are 
lower and upper bound of interval respectively 

3. for rows and r 2 , < r 2 , all elements of interval{ri) precede all elements of 

interval{r2) 

4. the union interval{r) over all rows r is the interval[0, 00 ) 

5. cost{r) is a function of the form o; or 7 + a where a is a constant r is a variable 

6. if cost(r) = a (respectively r + a) then for all t € interval{r). P.i, ,.{!•) has cost 
Q (respectively t + o) 

7. there is one route Vr associated with each r such that Vr is an optimal ii-v route 
when the vehicle starts from u at any time t € interval[r) 

and there is no smaller table satisfying the above conditions. 

The complexity of a cost vector Tu,v< complex is the minimum number of 
rows required to represent it as a table complex{Tu.v) S 0(n) [21]. 

Lemma 4 [21] Given two nodes u and v, complex(7;i,») < 4n 

3.3.2.2 Pointer Doubling algorithm 

In this section, we describe our parallel algorithm for APSR problem. This algorithm is 
based on the composition and minimization of cost vectors to compute new cost vectors 
and it uses the pointer doubling technique, introduced by Fortune and Wyllie [15]; Pointer 
doubling technique is inherent in such algorithms as computing the transitive closure of 
an adjacency matrix [12] and algorithms for tree contraction [12] 

High level description of algorithm is [21]: 

1. For every (tt, v) € E{G) in parallel compute Tu,v 

2. For every non-edge (u.v), let l'u,v table with interval U. ~x:) and cost 
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3.4 Parallel algorithms for dSVRPTW-line and rSVRPTW-Iine 
Problems 

Tsitsikiis [41] and Psaraftis et al. [33] have used dynamic programming technique to 
obtain quadratic time algorithms for dSVRPTW-line, rSVRPTW-line problems respec- 
tively. Our algorithms for these problems make use of the parallel algorithm for the 
shortest path problem for LDAG described in Section 2.3 combined with the dynamic 
programming formulation of Tsitsikiis and Psaraftis et al. We describe the sequential 
algorithm given by Tsitsikiis in Section 3.4.1 and then in Section 3.4.2, we give our 
parallel algorithms for dSVRPTW-line problem. Parallel algorithms for rSVRPTW-line 
problem is similar to dSVRPTW-line. So, we do not discuss it further. 

Let G be an instance of dSVRPTW-line. We can fix G to be the network ordered in 
the form a^, Om-Xj bp; where D is the central depot from where the 

vehicle originates and terminates. Since all the locations are on a line there are edges 
between two consecutive locations only. Every edge will have cost associated with it; 
the cost denotes the time taken for the vehicle to travel between the two locations. The 
optimal feasible routes for G can be assumed to be in a certain canonical form. For ease 
of exposition, we can assume that the nodes ao and 6o. both refer to same node, the 
depot D [21]. 

Definition 19 Let S = Vi,...,Vm+p+i be an optimal feasible route that covers G. 
Then S is uniform if for any k < m + p + l,vi, . . . ,Vk is an optimal feasible route that 
covers the subgraph of G induced by a,, Oi_i, .,.,ai,D,bi,...,bj for some i and j such 
that k = i + j + l. 

Lemma 9 [21] If there is a feasible route then there is an optimal feasible route 
Tsitsikiis [41] combined the notion of uniform routes with dynamic programming to 
obtain a quadratic algorithm for dSVRPTW-line problem. 

3.4.1 Sequential algorithm for dSVRPTW-line 

.et ui, . . . , Um+p+i be a uniform route in G. We can represent the subsequence Vi,...,Vk 
>y the ordered pair {i,j) where Oj and bj both appear in this subsequence and i-j-j + l = 
:. Then, the subsequence vi,...,Vk, Vk+i is represented by either {i + l,j) or {i, j -f 1) 
epending on whether Vk+i is Oi+i or bj+i. More generally, for 0 < i < m and 0<j<p. 
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oo. 

3. Loop log?) rounds 

in parallel for every pair (u, v) € £(G) 

(a) ^ = {u; j u; G V(G); (u. tv), (iv. v) 6 B(G)} 

(b) V’iL- G .4, compose Zc',v to form 

(c) Let Zc,L' = niin{7;,„,min{7;“ | u- € -4}} 

(d) If ( u. v) is not in E{G) then add edge {u, v) to E{C] 

Operator of composing T{ and % can be viewed graphically as follows: For each 
row r in 7i, calculate the corresponding cost at both ends of the time interval for that 
row. This cost interval computed becomes the time interval of %■ Then copy the graph 
for % to get the graph for T 12 for that interval. Putting it more formally, we have the 
following algorithm. 

Algorithm for composing 71 and %. 

1. for each row r in 7i, find rows s in % such that cost interval of r overlaps with 
time interval of s; 

2. for each pair r in 7i, and s in To (coster) overlaps with interval!.'-'): 

(a) compute the cost function by substituting the cost(r) in costs s): 

(b) create a row in Tii with newly computed cost function. 

3. merge the rows in T 12 which have identical cost function and route: 

Algorithm for finding the overlapped intervals. 

1. For each row r G 7i calculate the corresponding cost at both ends of the time 
interval. Now we can see that table is of four columns with first two columns 
representing the time interval (say 7^(11.71(2)) and last two representing the cost 
interval (say 71(3), 7i (4)). 

Let 72 ( 1 ), 71(2) are the columns representing lower and upper end of time intervals 
of 71- 

2. Merge arrays 71(1) and 71(3) and call it 71- 

3. Form an array Ti where, if 7i(i) € 71 then Ti{i) = rank of 7/(i) in 71> else Ti[i) = 0. 

4. For every Tt{i) such that T{i) € 71 (T/(i) = 0). find the the leftmost time interval 
of 71 in which the cost interval of 71 falls. (Find the nearest larger left hand side 
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in Ti). Say it r/. In other words, r/ gives the index of the first row in T:. which 
overlaps with cost interval of T\. 

5. merge arrays 7i(4) and 72(2) and call it 77-. Form an array T,- where, if % . < € 72 
then Tr(i) = rank of 77(i) in 77 else 7'r(j) = 0. 

6. For every T7(i) where T{i) G Ti (Tr(?"' = 0) find the rightmost time interval of 77 

in which the the cost interval of 71 falls. (Find the nearest larger right hand side 
in Ti). Say it In other words, gives the index of the last row in T'. which 

overlaps with cost interval of 71. 

7. For every r in Ti, find the value = Vr - ri + 1. 

It represents the number of rows in T>, whose time interval overlap with cost 
interval of r. 

Since the cost functions are monotonic and time intervals in tables do not overlap, 
the total size of the composed table is sum of tables that are being composed. If we 
do a prefix sum of r„ for every row (excluding its own To from prefix sum), to get the 
indices of the each row in the final table constructed. 

Lemma 5 [21] 77% can be described by a table using at most complex [T }-rco'r.plex{7la,v) 
rows. 

77 % is a cost vector in a sub graph of G. hence by Lemma 4 con:; < .ciT is at 
most 4n: thus some rows of 77% that can be merged. 

An example describing algorithm composition operator 
Let us take the tables and 77u,- shown in Figure 2. 

1. computation of cost function at both ends of time interval of T,,,,-- 


2. 77 = merge(r 2 (l), 77(3)) = [0. 4, 5, 8, 9. 9, 10, 12] 

Ti = [ 1 , 0 , 0 . 2 , 3 , 0 , 0 , 4 ] 

3. ri = [1, 1, 3, 3] (nearest larger on left hand side for zeros in Ti) 
A.%= merge(77(4),r2(2)) = [4, 7, 8. 8, 9. 11, 00 , 00 ] 


VW 


interval 

cost 

[0,1] 

[-1,4] 

[2,5] 

[5,8] 

[6,7] 

[9,9] 

[8, 00 ) 

[10, oc) 


wv 


interval 

cost 

[0.7] 

12 

[8,8] 

7^-5 

[9, 11] 

13 

[12, 00) 

T-r 2 
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Tr = [0.1. 0.2, 0.3, 0,4] 

5. Tr = [1, 2, 3. 4] (nearest larger on right hand side for zeros in 7',.) 

6 . Tr-ri-h 1 = [ 1 . 2 . 1 , 1 ] 

7. this means first row of Tuxv overlaps with one (first) row of T,rr, 
second row of overlaps with two (first, second) rows of 77, r 
third row of Tuw overlaps with one (third) row of Twv< 

fourth row of 77u- overlaps with one (fourth) row of 77,u. 

8. table 77, ■ after calculating the cost function, before merging rows. 


interval 

cost 

route 

[0.1) 

12 

U, Wi, W. Ui, u 

[2,4] 

12 

u, Wi, a\ Ui, V 

[5,5] 

r *+ 8 

W, V2, V 

[6,7] 

13 

U^W2, VL\ V2,V 

[ 8.91 

13 

U,W2, W,V2, V 

[10, oo) 

r + 4 

U,W2,VL\ L’2, V 


9. table 77v 3^6^' merging. 


interval 

cost 

route 

[0.4] 

12 

IL\ Vu V 

[5.5] 

r + 8 

U,Wi^ IL\ V2, V 

[6,9] 

13 

U, W2, IV. 1% V 

[10, oo) 

r+4 

U, W2, U\ ^2, V 


Lemma 6 The composition operation {Vu; € -4 j 77“v = 77, u; + 77 m,- be performed 
in O(logn) time using processors on a CREW PRAM. 

Proof: Our algorithm for composition of two tables is based on finding prefix sum 

on inputs of size 0{n), merging two arrays of sizes 0{n) and finding nearest larger in 
an array of length 0(n). These operations take O(logTi) time using ^ processors 
on a CREW PRAM [28. 26, 6]. So, C>(logn) time with processors is sufficient to 
compose two tables. The composition operation described above needs composition of 
n pairs of tables. So, it takes O(logn) time with processors. I 

One way of looking at problem of finding minimum of 71 and 77 is to find the time 
intervals of 77 which overlap with a time interval of 77 for every row. Then find the 
minimum cost of the intervals overlapped. Putting it more formally we have the following 
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algorithm. 

Algorithm for finding minimum of two tables 71 and 71 • 

1. for each row r in 71, find rows s in 71 such that time interval of overlaps with 
time interval of s; 

2. for each pair r in 71, and 5 in 71 (such that interval(r) overlaps witn interval(s)): 

(a) compute the cost function by taking minimum of cost(/'), ccs: 

(b) create a row in Tls with newly computed cost function; 

3. merge the rows in Tla which have identical cost function and route. 

For finding the overlapped intervals we can follow a similar procedure described for 
composing two tables algorithm. 

An example describing algorithm minimization operator 
Let us take the cost vectors Tuwiw snd Tlw-u- of graph 'n Figure 2. 



2. 71= merge('71{l),ri(l)) = [ 0 , 0 , 2 , 6.81 


ri = [l,0,0,0,2] 

3. ri — [l, 1. 1] (nearest larger on left hand side for zeros in 7/) 

4. % = merge(ri(l), %{l)) = [1, 5, 7, oc. oo] 

r, = [0,0,l,0,2] 

5. Tr = [1, 1, 2] (nearest larger on right hand side for zeros in Tr) 

6. r,-rj + l = [l,l,2l 

7. this means first row of Tuwiw overlaps with one (first) row of Tmc . . 
second row of 7'uwiw overlaps with one (first) row of 'T-u.vj^rv 

third row of Tlwiui overlaps with two (first, second) rows of 71j.„. 

8. table Tuw after calculating the cost function (There are no rows to be merged). 
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interval 

cost 

route 

[0,1] 

4 

U,Wi,W 

[2. 6] 

+ 

CO 

U,,Wi, w 

[6,7] 

9 


[8,oo) 1 

r + 2 

U,W2,W 


Lemma 7 The minimization operation Tu.v = niin{7^,„,min{7^'‘'’. ■ ir € .4}} can be 
performed in 0(log" n) time using processors on a CREW PRAM. 

Proof: Finding the minimum of table Tuw and table Jiuiv, 'S very much similar to 

the algorithm of composing two tables. So, it takes O(logn) time with processors 
on a CREW PRAM. For finding minimum of n tables we can proceed in a manner 
analogous to finding minimum of n numbers. Thus, the lemma follows from the fact 
that minimization operation requires to find the minimum table of ;; tables. I 

Lemma 8 [21] After the first k iterations of Step 3 of the pointer doubling algorithm, 
the correct value ofTu,v has been computed for every pair of nodes u and v when only 
routes of length at most 2* are taken into account 

Theorem 2 All pair S-T routing problem can be solved in 0{log'^ n ) time using 
processors on a CREW PRAM. 

Proof: The correctness of the proof follows from Lemma 8. Steps 1 and 2 can be 
performed in 0( 1 ) time using at most 0{rr) processors. Therefore we need to show that 
Step 3 can be performed in O(log^n) time using processors. In particular, Step 3 
Involves considering all pairs u and v in parallel, it suffices to show that Steps 3(a)-3(d) 
can be performed in O(log^n) time with processors. 

For any pair (u, v) in Step 3(a), we independently check all nodes ir to see if they 
are candidates for .4; this takes 0(1) time with n processors. Since Step 3(b) can be 
performed in O(Iogn) time with processors by Lemma 6, it can also be performed 
in O(log^Tz) time with processors (Since CREW PRAM is a self-simulating model). 
Step 3(c) can be performed in 0(log^ n) time with processors by Lemma 7. Finally 
Step 3(d) takes 0(1) time using one processor. I 
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let C{i,j;L) (respectively C{i,j:R)) be the cost of uniform route on network induced 

by the subgraph </, ai, D.bi. . . . J)j in which Oj is the last element of the sequence 

[bj is the last element respectively). The initial conditions for the dynamic programming 
algorithm are C(0.0:£) =C(0, 0;i?) = 0. 

Then, for i, j > 0, 

C{i,j]L) = min{C(i — l,j:L)+ t{aj-i.ai),C{i - 1. j; R) + Hbj. o,)} (1) 

- ium{C(i,j - l:i?) i-t{bj-i.bj),C{i,j - 1;I) + Utij.bi)} (2) 

Theorem 3 [41] dSVRPTW-line problem can be solved sequentially in O(n^) time 
where n is the number of locations. 

Proof: The algorithm proceeds by computing C{i,j.,L) and C{i.j. R) for 0 < i < m 
and 0 < i < p. Each value of C can be computed in constant time using above given 
equations. Since there are values to be computed, the theorem follows. 1 

3.4.2 Parallel algorithms for dSVRPTW-line problem 

Our parallel algorithms are based on the dynamic programming formulation in Theo- 
rem 3, We construct a configuration network for a given instance graph G with nodes 
representing uniform routes in G and edges representing extensions of one route to an- 
other by the addition of one new node. Interestingly the configuration network formed 
is a LDAG. We then use the variants of shortest path algorithms for LDAG to find the 
shortest route in configuration network. 

3.4.2.1 Layered graph construction 

For G = am, • • • . (H-D, 5i. . • - , bp, the configuration networks a layered directed acyclic 
graph Q = (V. £). where 

1. n = m + p + l 

2. V={{Lj.R),iiJ.L) :l<i< 1 < y < p}u{(0, 0, J9)}U{{m+l,p+l, i?)}; 

3. The source and sink node between which we need to find the shortest path are 
S{g) = (0. 0. D) and T(5) = (m -t- l.p + l,D) respectively: 

4. All the nodes have zero handling time, the dead line of (0.0. G) is 0 and the 
deadline of {m-¥ l,p+ 1) is oo. The deadlines of [i,j,L).{i.j.R) sre d{ai) and 
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d{bj) respectively: 

5. there are edges from {m,p,L) and (m.p.R) to (m + l,p-f 1. /.)) with travel time 
as 0; 

6. there are edges from (0, 0, D) to (1, 0; L) and (0, 1, R) with travel time as t{ai,D) 
and t{D,hi) respectively; 

7. for 1 < i < ni and I < j < p there are edges 
from {i. j. /, ) to (i + 1, j, L) with travel time 

from {i.j. L\ to {i.j + 1, R) with travel time t{ai,bj+i)'. 
from {i. j. R) to (i, j + 1, R) with travel time t{bj,bj+i); 

from {i. j. /?) to (i + 1, j, L) with travel time t{bj, ai+i); 

8. there are ii -f- i layers in the conjiguration network Q\ 

9. (0,0,1?) will be in the first layer, and (m + l,p + 1) in last layer respectively. 
Further node {i,j,L), {i,j,R) will be in (i + layer; 

10. b — ;/ — m); 

11. maximum number of nodes in any layer will be 26 + 1 

' 2{i - 1) 2 < / < 6 + 1 

12. the number of nodes in any layer k will be < 26+1 6 + 2 < / < n — 6 + 1 

2{n — i + 2) n - b + 2 <i < n -rl 

13. first layer will have only source node and last layer will have sink node; 

14. a cost vector or cost table is associated with each {u, v) € £; 

The node {/.J. L) denotes the optimal route covering [ai,6_,] with a,- being the last 
node visited. Similarly the node (i, R) denotes the optimal route covering [aj, 6j] with 
bj being the last node visited. Node (0, 0. i?) and (m + l,p + 1. D) denote the null 
route and optimal route respectively. 

Lemma 10 [21] For Q the configuration network of a line G, the optimal S{Q)-T{Q) 
route corresponds to an optimal uniform covering route ofG. 

Proof: From the construction of Q and Theorem 3, it follows that a route from 5{Q) 
to T{Q) in Q with cost C corresponds to a uniform route in G whose cost is also C. As 
well, given an optimal uniform route P in G of cost C there is a route in Q from 5(^) 
to T{Q) of cost C since every initial segment of V is also a uniform route and the nodes 
of Q describe all uniform routes as well as transitions from one uniform route to another 
in one step. Thus, the lemma follows. i 
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3A.2.2 0{n) time Optimal parallel algorithm 

In this section we describe 0(n) optimal parallel algorithm for dSVRPTW-line problem, 
when the vehicle starts at time 0 

1. Construct a LDAG for a given instance of the problem as described above in 
Section 3.4.2. 1 

2. for n + 1 iterations do 
begin 

remove all nodes at second layer; 

end; 

For removing nodes at second layer, we allocate one processor to every node in third 
layer and find the shortest path from source node to that node. Since the in-degree of 
any node in LDAG is 2, the number of paths from source node to any node in third 
layer is 2. So, we can find the shortest path from source node to any node in third layer 
in 0(1) time. Since there can be at most n + 1 nodes in third layer in any iteration 
(the maximum number of nodes in any layer before first iteration is n + 1). we need 
n processors in each iteration. Since there are n -f 1 iterations it we takes 0(n) time. 
Thus, we can find an optimal schedule for coalescing operations when number of jobs is 
two in 0(n) time with O(n^) work. 

3.4.2.3 O(log^n) time Parallel algorithm 

This algorithm is much faster than the 0(n) time optimal algorithm given above. The 
algorithm computes the optimal path for all vehicle starting times in interval [0,oo). 

1. Construct a LDAG for a given instance of the problem as described above in 
Section 3.4.2. 1 

2. for f = 0 to log n 4- 1 do /* O(logn) iterations */ 
for i = 2 to — 1 do 

parbegin 

if i is even then remove nodes at layer v, 
parend; 

The algorithm for removing all nodes at layer i is given below. Here, d is the maximum 
in-degree (or out-degree) of any node in the current iteration. Clearly, d<n + l. 
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for each node r of layer i + 1 do /* at most 2*6+1 times * / 
for each node v at this layer( 2 ) incident from r do /* at most d times */ 
for each node p of layer j - 1 incident from v do /* at most d times * / 
parbegin 

(a) = compose(7;,„, 7;,p) 

(b) Tr,p = mm„(7;yp) 
parend; 


3.4. 2.4 Analysis of Algorithm 

Lemma 11 The maximum degree of any node before iteration will be min(2*, 26+1) 

Proof: Any node in the graph has degree less than or equal to the degree of source 
node of the shortest path to be found. This is because, the source node satisfies the 
same constraints as any other nodes in the graph. So, it is enough if we prove that the 
degree of source is min(2*^,26 + 1) before iteration. It is evident from algorithm 
that after every iteration the number of layers decreases by a factor of 2; further after 
any iteration layer i will become layer (^) (if i is odd). Therefore, in iteration the 
layer adjacent to first layer (second layer) will be same as the layer 2*'“’- + 1 before the 
first iteration. (This can be proved by simple induction). The number of nodes in layer 
i before first iteration is 2{i — 1). Therefore, the number nodes in second layer before 
k iterations will be 2(2*“^) = 2*. It is evident from description of our graph that every 
node is reachable from the source (this implies that every node in the second layer is 
connected to source) and the maximum number of nodes in any layer is at most 26 + 1 
so, the degree of a node in any iteration can not exceed 2n + 1. Therefore, the degree 
of source node is min(2*, 26+1) before fc** iteration. Thus, the result follows. I 
Lemma 12 For removing nodes at layer i in iteration, the algorithm on a CREW 
PRAM takes O (log(min(26 + 1, 2*) * (logn))) time using Processors. 

Proof: For removing nodes at layer i we need to find %^p where, r G layer i — 1 and 
p € layer i + 1. For finding 7p,p we need to find minimum of where, v G layer % and r 
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is incident on r, and v is incident on p. If d is the degree of r then we need to compose 
d tables and find minimum among these d tables. By Lemma 6 and 7, we can compose 
two tables or find minimum of two tables in O(logn) time with processors. We can 
compose d pairs of tables into d tables in 0{logd*logn) time with processors. 

We can also find minimum of d tables in 0{\ogd*\ogn) time with |og^*fog„ processors. 
Therefore, we can find Tr.p in 0{\ogd*\ogn) time with j - — processors on a CREW 
PRAM. Since there can be d nodes in layer i 4- 1 associated with v, and as there can be 
at most (26 + I ) nodes in layer i - 1 the number of processors for removing nodes at 
layer i is (264- U -d* We have by Lemma 11 that the degree of any node in 

iteration is iiauC26+ 1,2*). Therefore, we need 0(26+1) * * I^) 

processors and O (log(min(26 + 1, 2*) * Iog;i)) time for iteration. I 

Theorem 4 There is a CREW PRAM algorithm for dSVRPTW-Iine that takes 0(log^ n* 
log 6) time using /.ffg- processors. 

Proof: The initial number of layers in the graph is n + 1. In every iteration we will 
be removing half of the layers. Therefore, there will be layers before iteration. 

Combining this with Lemma 12, if in iteration we use ((^^) (26 + 1) * ( iog « 

processors, time will be 0(logmin(26+ 1.2*)). These expressions will be maximum 
when 2*“^ > 26 1 or when k > log(26 - 1) + 1. Therefore, the maximum num- 
ber of processors in any iteration is togXt^ ' n ’ iteration takes at most 

0 (logn * log 6) time and there are logn + 1 iterations, the time required for our algo- 
rithm is 0(log' u >• log 6). Thus the result follows. I 

3.5 Traveling Repairman Problem 

Afrati et al [1] have described an 0{n^) algorithm for special case of traveling repairman 
problem when handling time is zero and all the locations are in a line without having 
time windows associated with them (See Tsitsiklis [41]). But, because of non-availability 
of that paper we describe our own O(n^) time dynamic programming algorithm for this 
problem in Section 3.5.1. In Section 3.5.2 we give parallel algorithms for this special case. 

Our parallel algorithm makes use of the dynamic programming formulation combined 
with shortest path algorithm for LDAG. 

Let G be an instance of TRP-line. We can fix G to be the network ordered in the 
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following canonical form am,a,n-i, ■ . ■ .au D.bu ■ • • .bp, where D is the central depot 
from where the vehicle originates and terminates. Since all the locations are on a line 
there are edges between two consecutive locations only. Every edge will have cost 
associated with it denoting the time taken for the vehicle to travel between the two 
locations. The optimal feasible routes for G can be assumed to be in a certain canonical 
form. For ease of exposition, we can assume that the nodes Oq and 6o. both refer to the 
depot D. 

Definition 20 Let S = vi, , t',„ +p-\-l be an optimal feasible route that covers G. 
Then S is uniform if for any k < m + p + 1. fi, . . . , I'k is an optimal feasible route that 
covers the subgraph ofG induced by ai_i , ... ,ai,D,bu — bj for some i and j such 
that k = i-¥ j r- I. 

3.5.1 Sequential algorithm 

Let Vi,. .. ,v, I be a uniform route in G. We can represent the subsequence vi,. ..,Vk 
by the ordered pair (i,j) where a,- and bj both appear in this subsequence and i+j + 1 = 
k. Then, the subsequence ui, . . . . Vk. Ck+i is represented by either (i + l.j) or {ij + 1) 
depending on whether Vk^i is Of+i or bj4.i- More generally, for 0 < i < m and 0 < j <p, 
let C{i,j]L) and C{i,j;R) be the cost of uniform route on network induced by the 
subgraph .... ni, D.,b\, ... ybj in which is the last element of the sequence {bj 
is the last element respectively) and let T{i.j]L), T{i,j\R) be the times taken for 
these uniform routes. The initial conditions for the dynamic programming algorithm are 


C(0,0; L) = C(i). 0. R) = 1(0,0;!) = r(0,0:fi) = 0. 

Then, for i,j > 0, 

Let Q = T{i,j — T.L)-rt{i,j), (3) 

3 = — + 

7 = T{i + 1, j; L) + t{i, i + 1) and (5) 

6 = T{i+l,r,R) + t{i,j). ( 6 ) 

ThenC{iJ,R) = mm{C{i,j - hL) + a, C{i,j - l,R) + 3} (7) 

C{i,j,L) = mm{C(i + 1, j, i?) +7) 
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a 


( 9 ) 


/w.j.I) = 


ifC{i.j,R) = 
J ifCii.j-.R) = 

<)■ ifC{i.j-,L) = 


C(i,j - 1; I) + a 
C(i,j - + i 

C{i+ l.,}:R) + ' 
C(i + 1. y; £,) + () 


( 10 ) 

( 11 ) 


Theorem 5 T.'te special case of TRP-Iine in which the handling times of all nodes is 
zero can be soned in 0{ir) where n is the number of locations. 

Proof: The Ji^orithm proceeds by computing C{i.j, L) and i?) for 0 < i < m 
and 0 < j < y Each value of C can be computed in constant time using above given 
equations. Since there are O(n^) values to be computed, the theorem follows I 


•3.5.2 Parallel algorithms for TRP-Iine problem 

Our parallel algorithm is based on the dynamic programming formulation in Theorem 5. 
We construct a configuration network for a given instance graph C with nodes represent- 
ing uniform routes in G and edges representing extensions of one route to another by the 
addition of one new node. Interestingly the configuration network formed is a LDAG. 
We then use the variants of shortest path algorithms for LDAG to find the shortest route 
in configuration network. 


3.5.2.1 Layered graph construction 

We can construct a layered graph that is similar to one in the dSVRPTW-line problem. 
There will be n - 1 layers as in the dSVRPTW-line problem. For G = a,„, — ai, D,bi, . . . ,bp, 
the configurafiini network is a layered directed acyclic graph Q — (V, £). where 

1 . n = m T :> + I 

2. V = {(/. ;?). {ij. L):l<i <m, I <p}U {(0, 0, D)} U {(m + l.p+ 1, D)}; 

3. The source and sink node between which we need to find the shortest path are 
S{g) = (0. 0, D) and T(^) = {m + l,p-f 1,11) respectively; 

4. If (u,v) edge is in the graph then it denotes optimal cost route between the 
vertices it and v. We associate three values d{v.,v), a{u.i') and /en(u, u) with 
each edge, let u, «i, ua, . • • , ur- v be the optimal cost route between the vertices 
u and V then d{u, v) denotes the sum of waiting times of nodes in that path 


35 



(i.e. d{(i. r I = {/ -r 1) * t{n. Ill) + I * f(Uj, ti 2 ) + ■ • • + 2 * A(v//_i. tit) + t{ut.v)), 
a(ii, v) denotes the traveling time from « to v in the optimal path (i.e. a('U. v) = 
t{u, U\) ~ t{ lit. uy) + t{u 2 . u-i) + ■ ■ . + + t{'Ui, L')) and len{u. v) denotes 

the length of the optimal path (i.e. /-f 1). The initial values of f/((/., f), a{u, v) are 
t{u,v). Where t{u.v) denotes the travel time between those nodes. The initial 
values of /i ii{a, v) is 1 if t(u. v) ^ 0, else len(u, v) is 0. 

5. there are edges from {m.p.L) and {m.p,R) to {m + l,p + 1. D). 

6. there are edges from (0,0. D) to (1,0:1) and (0,l,i?). 

7. for 1 < I S "> 3nd I < j < p there are edges 

from (i. /. /.J to {i + l,j,L), from to {i,j + l,R) from {i.j.R) to {i,j + 

l,R), from (/.j, i?) to (i + l.j, L) 

8. source and sink will be in first layer and last layer respectively. [i.j.L), {i,j,R) 
will be in { / + j)‘^‘ layer; 

9. b = mini///. i> — m); 

10. maximum number of nodes at any layer will be 26+ 1 
The node {'. /. L) denotes the optimal route covering [a,-.6j] with n, being the last 
node visited. Similarly the node {i.j. R) denotes the optimal route covering [oj, bj] with 
bj being the last node visited. Node (0, 0. jD) and (m + l,p + 1,D) denote the null 
route and optimal route respectively. 

3.5.2.2 0{ii] time Optimal parallel algorithm 

1. Construct a LDAG for a given instance of the problem as described above in 
Section 3. 5. 2.1 

2. for n + J iterations do 
begin 

remove all nodes at second layer: 

end; 

For removing nodes at second layer, we allocate one processor to every node in third 
layer, and find the a from source node to that node such that sum of waiting times of 
nodes in path is minimum. Since the in-degree of any node in LDAG is 2, the number of 
paths from source node to any node in third layer is 2. So, we can find the shortest path 
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psth from source node to any node in third layer in 0(1) time. Since there can be at 
most n + 1 nodes in third layer in any iteration (the maximum number of nodes in any 
layer before first iteration is n+ 1). we need n processors in each iteration. Since there 
are n + 1 iterations it takes 0(n) time. Thus, the special case of TRP-line in which the 
handling times of all nodes is zero can be solved in Oiri) time with 0{v?) work. 

3.5. 2.3 O(log^n) time parallel algorithm for TRP-line problem 

1. Construct a LDAG for a given instance of the problem as described above in 
Section 3.5.2. 1. 

2. for t = 0 to logn + 1 do /* O(logn) iterations */ 
for i = 2 to ^ — 1 do 

parbegin 

if i is even then remove nodes at layer i; 
parend; 

The algorithm for removing all nodes at layer i is given below. Here, d is the maximum 

in-degree (or out-degree) of any node in the current iteration. Clearly, d < 2n -h 1. 

1 

for each node r of layer i — 1 do /* at most 26-1-1 times */ 

for each node v at this layer i incident from r do j* at most d times */ 

for each node p of layer i + l incident from v do /* at most d times */ 

parbegin 

(a) d{r,p) = mmv{d{r,v) -I- d{v,p) -f len(v,p) * a(r, t;)); 

(b) Let u be the node that gives rise to the minimum value of d(r,p); 

(c) a(r, p) = a(r, u) -f a{u, p) ; 

(d) len{r,p) = len{r,u) + len{u,p); 
parend: 

Observe that the algorithm is similar to dSVRPTW-line problem. The correctness 
proof will also be similar. It can be easily proven that above algorithm takes 0(logn * 
log 6) time with C>(n6^/ log 6) processors on a CREW PRAM where b = min(m, n-m). 
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3.5.2.4 An example 

Figure 3(a) shows a TRP-line with five nodes V'l ...V 5. Figure 3(b) shows the LDAG 
constructed following the procedure given in above in the section. It has six layers. We 
have shown only travel times between the nodes for simplicity. The reader can easily 
infer d{u,v), tii u. c) and len-lu^ v) associated with edges. Figure 3(c) and Figure 3(d) 
shows the LDAG in second and third iterations respectively. Optimal solution path for 



Figure 3: An example describing our parallel algorithm for TRP-line problem 


this example is D. 1 2, 1 4. \ 5, Vi, the sum of waiting times of all nodes is 51 and the cost 
of the path associated with the optimal solutions is 28. 
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Chapter 4 


Parallel Algorithms For Coalescing 
Operations with Precedence 
Constraints In Real-time Systems 

4.1 Introduction 

In many applications [7], real-time systems are modeled as object-oriented systems. In 
an object-oriented system, each object has a set of well defined operations. Each object 
also has local variables, which may be accessed only by the operations defined in the 
interface of the object. The object decides if and when to process the requests from 
other objects. 

In most applications, coalesced operations must be defined by the programmer, due 
to the semantic issues involved. On the other hand, whether a coalescence should be 
performed or not is usually decided by the object (or system) scheduler. In the scheduler, 
the reward (usually the amount of computation time saved) for each coalesced operation 
must be pre-analyzed and recorded in a reward table before execution. When a sequence 
of requests arrives, the scheduler consults the reward table and determines which requests 
to coalesce in order to maximize the total reward. 

In a real-time system, the request patterns of many jobs are known in advance, 
especially when jobs are periodic. For example, service requests from a radar monitor 
are periodic and well-defined. Given K periodic jobs and the reward table, we can 
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determine whicli requests to coalesce to produce the maximum total reward. In other 
words, we can find the coalescence schedule that saves the most time. If some requests 
are not predefined, we can still perform the analysis at run-time if the complexity of 
scheduling coalesced operations is not high. Bihari et al. [7] have proved that finding 
an optimal scheaule in the general case is jVT^-hard. 

For the case of two periodic jobs, Chen et al. [9] have developed an 0(n®) time 
sequential algorithm and Liu et al. [30] have reduced it to O(n^) time. We show that 
there is an an time parallel algorithm using 0{n^/logn) processors for this 

problem on a CREW PRAM. 

4.2 Scheduling two periodic jobs 

Liu et al. [30] have used dynamic programming to obtain a quadratic time algorithm 
for finding optimal schedule when the number of jobs is two. Our algorithm makes 
use of the parallel algorithm for the shortest path problem for layered directed acyclic 
graph (LDAG) combined with the dynamic programming formulation of Liu et al. We 
describe the sequential algorithm proposed by Liu et al. [30] in Section 4.2.1. We give 
an example for the problem in Section 4.2.2. In Section 4.3 we will use this dynamic 
programming formulation for constructing our parallel algorithm. 

4.2.1 Sequential algorithm 

Each job Ji has n sequential operations Jf.i, /f, 2 > • • • » f == 1)2- The system can be 
represented as an undirected graph. Each vertex Vij denotes an operation .Jij. There 
are edges between c/j and Vij+i. i = 1)2, j = 1,2, — 1 and between and 

V 2 q,p= 1,2 ii.q = l.'2,...,n. Each edge is associated with a weight representing 

the reward value. Thus, maximizing the total reward is equivalent to finding a maximum 

weighted comixitlhlc matching '\a the graph [30]. Any two edges (‘yi,r- 

in a weighted compatible matching satisfy either r < p and s < q or i' > p and s > q. 

In other words, the maximum weighted compatible matching problem is to find a subset 
of edges that do not cross each other and whose total weight is maximized. This subset 
of edges is called the masimum weighted compatible matching set [7]. 

Let Rij denote the maximum total reward for matching operations in Ji.j. Ji, 2 , • • - , Ji,i, 
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and Ja.i, k,j.. r^j the weight of edge V 2 ,j), l<i<n,i<j <n, and Ck,i 

the weight of eo-e {vu~uiXi).. k = l.2,l<l<n. Clearly, considering operations Ji.i 
and Ja.ji we hov^ the following equations[30]: 




Ri.j 

-1: + Rl 

-2J) C2J + Ri,j-2, Tij + 



<>r 2 <i < 

n.2 

<j<% 


(12) 


= inax{i?oj, 

L: <-\j + RiJ- 

- 2 , rij + Ro,j-i} for 2 < j < n, 

(13) 

■^.1 


i?,,o 

1 + Ri- 2 ^, + Ri~ifi} for 2 < i <n, 

(14) 


= iuax{/i:o,j-i, 

. C2.j 

+ Rij- 2 } for 2<j<n, 

(15) 

Ri,o 

= iuax{i2i_i,o, 

Ri.o 

1 Cl.i + Ri-2. 

,j} for 2<i<n, 

(16) 

Ro,o 

= Rifi = /?o,i ' 

= 0, 




Ri,i 

= I'ui if 

>0, 

and Ri^i = 

0 otherwise. 

(17) 


In Equation 12, the first (second) term in the right-hand side represents the case that 
operation Jj,, is not coalesced with any other operation. The third (fourth) term 
represents the case that operation Ju {Jij) is coalesced with its immediate predecessor. 
The fifth term represents the case that Ji^i is coalesced with J2j- No other cases are 
possible, because edges in the maximum weighted compatible matching set are not 
allowed to cross each other due to precedence constraints. The other equations are 
similarly derived. With initial values given by Equation 17, we can compute the value of 
Rn,n> which is the maximum total reward in O(n^) time. 

4.2.2 An example 



opi 

OP2 

OP3 


opi 

2 

7 

5 

0 

0P2 

7 

2 

0 

4 

OP3 

5 

0 

2 

1 

OP4 

0 

4 

1 

2 


Table 3: Reward table of the example 

Figure 4 shows a real-time object with four primitive operations opi, op 2 ,op 3 , and 
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Figure 4: An example of scheduling operations with precedence constraints 

op 4 . There are two periodic jobs and each has four operations. Job Ji consists of opi, 
opi.opi, and (Ji,!, J 1 . 2 , Ji,3, Ji,^ respectively). Job J 2 consists of op 4 , opz, opi, 

and op 2 «^ 2 , 3 i >^ 2 , 4 . respectively). The reward values are given in Table 3. The 

maximum total reward is 16 and the scheduling sequence is J 2 ,i. J %2 + As, Ji,i 4- J 2 , 4 . 

Jl,2 + Jl,Z. 

4.3 Parallel algorithms for scheduling two jobs 

Our parallel algorithms are based on dynamic programming formulation presented in 
Section 4.2.1. We construct a layered graph for a given instance of problem. We then 
use the algorithm described in Section 2.3 for finding the optimum schedule. 



Figure 5: Information flow in equations given in Section 4.2.1 

Figure 5 shows the information flow of the equations given in Section 4.2.1. If we 
try to construct a graph by placing % in (i + j)‘'* layer (say /) there will be edges 
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between layer / and / + 2. This will not be a layered graph, in order to construct a 
LDAG, we regroup the terms in Equations given in Section 4.2.1. In Equation 12 Rij 
was calculated maximum of five terms. If we place Rij in layer (i + J'r- (say 1) layer 
then /i’ ;-i and will be in layer I — 2. So, we find the maximum of 

these three terms and put it in layer I — 1 and then find the maximum among remaining 
two terms {/? . and new term that we put in layer I - 1. Then, we will 

have edges from / to / + 1 only. Thus forming a LDAG. For ease of representation we 
replace with /? 2 (j+i). 2 (j+i)- Thus we get following Equations 18 and 19. Similarly, 
we get Equations 20 and 21 from Equation 13, Equations 22 and 23 from Equation 14, 
Equations 24 and 25 from Equation 15, Equations 26 and 27 from Equation 16. Thus 
we can rewrite tlie equations as follows: 


■' ■^2i,2j— 2> l,2j— l} 

for 3<i<n + l, 3 < j <n + l, (18) 

l,2j-l “ tnaac{ciji_l + i22j-4,2yj l "F ■^i,2j-4> Ij— 1 "F 2,2y-2} 

for 3<i<n + l, 3<j<n + l, (19) 

- max{i?2,2;j ■^4.2>-2> ■^k2ji-l} 

for 3 < j < n + 1 ,. (20) 

- ma.v:{c2,j-i + ■^4,22-4) Hj-i + 

for 3 < j < n + 1, (21) 

for 3<i<n + l, ( 22 ) 

1,:5 “ rnaac^Ci^t — 1 ”F 1,1 "F -^ 1 — 2 , 2 } 

/or 3 < i < n + 1, (23) 

■^,2j ~ l'n3^{-^,2i-2: 3 < J < 71 + 1, (24) 

i?l,2i-l = C2J-l+i?^.2j-4 fOT3<j<n+l, (25) 

i ?',. 2 = ma.x{i?^i_ 2 . 2 . for3<i<n + l,.. (26) 

-^1-4,2 3 < i < 72 + 1, . (27) 

i?2,2 = ■^4,2 = •^,4 ~ 
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■^^4,4 ~ ^ 1,1 */ > 0, and 4 = 0 otherwise. (28) 




Figure 6: Information Flow in modified equations 

Figure 6 shows the information flow in modified equations. In Equation 18, the first 
(second) term in the right-hand side represents the case that operation ( J 2 j) is not 
coalesced with any other operation. In the Equation 19 first (second) term represents 
the case that operation (</ 2 ,j) is coalesced with its immediate predecessor. The third 
term in Equation 19 represents the case that is coalesced with No other cases 
are possible, because edges in the maximum weighted compatible matching set are not 
allowed to cross each other due to precedence constraints. The other equations are 
derived similarly. With initial values given by Equation 28, we can compute the value of 
^n+ 2 , 2 n+ 2 * which is the maximum total reward in O(n^) time. 

4.3.1 Layered graph construction 

Given an instance of the problem we can construct a layered graph G = {V, E) where 

1. V = {(i,j) : 1 < i < 2(n -F 1),1 < j < 2(n + 1) where {i + j) is even} - 
{(1,1), (3,1), (1,3)} 

2. Let us call the nodes of the form (2i, 2j) as even nodes and nodes of the form 
(2i -F 1, 2j -F 1) as odd nodes. We can clearly see that nodes other than even and 
odd are not possible. 
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3. The source and sink nodes between which, we have to find the shortest path are 
S{G) = '.’.2) and T{G) = (2{n+l).2(n+ 1)) respectively. 

4. From Equation 18 we add edges from {2^ - 2, 2j), (2i, 2j - 2) and {2i - 1, 2j - 1) 
to (2/. 2 .vith costs 0. Similarly for Equation 19 we add the following edges to 
(2i — 1. 2 - 1) from {2i — 4.2j) with cost Ci^i, from {2i.2j — 4) with cost Caj 
and from 2i - 1. 2jf - 1) with cost Thus we will have the following edges 
for 1 < ' n + 1 and 1 < i < n -j- 1 

(2i + 3, 2j - 1) with cost cu, 

(2i + 2, 2j) with cost 0. 

(2z, 2j ) 1 ) < {2i + 1, 2j + 1) with cost rij, 

(2i, 2j + 2) with cost 0, 

{2i - 1, 2j + 3) with cost csj, 
for 1 < • /) and 1 < j < n, there is an edge from 

(2i + 1. 2,, + 1) to {2i + 2, 2j + 2) with cost 0 

5. number of layers are 2n + 1. 

6. node (/. ,/ will be there in ^ — 1*^ layer. 

We observe the following facts from the construction of graph. 

Fact 1 Number of nodes in any layer i is 2(n + 1 — |n + 1 — fj) + 1 (st most 2n + 1). 
Fact 2 Maximum in-degree or out-degree of any node is 5 

4.3.2 0{n) time Optimal parallel algorithm 

1. Construct a LDAG for a given instance of the problem as described in Section 4.3.1. 

2. for 2n -f 1 iterations do 
begin 

remove all nodes at second layer; 

end; 

For removing nodes at second layer, we allocate one processor to every node in third 
layer and find the shortest path from source node to that node. Since the in-degree of 
any node in LDAG is 3, the number of paths from source node to any node in third 
layer is 3. So, we can find the shortest path from source node to any node in third layer 
in 0(1) time. Since there can be at most 2n •+ 1 nodes in third layer in any iteration 
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(the maximum number of nodes in any layer in first iteration is 2n + 1) we need 0(n) 
processors in each iteration. Since there are 2n + 1 iterations it we will take 0{n) time. 
Thus, we can find an optimal schedule for coalescing operations when number of jobs is 
two in 0(n) time with O(n^) work. 

4.3.3 0(iog^ n) time parallel algorithm 

The parallel algorithm for finding an optimal schedule is as follows; 

1. Construct a LDAG for a given instance of the problem as described in Section 4.3.1. 

2. for f = 0 to log(n + 1) + 1 do /* O(Iogn) iterations */ 

fori = 2 to — 1 do 
par begin 

if i is even then remove nodes at layer i; 
parend; 

The algorithm for removing all nodes at layer i is described below. Here, d is 
the maximum in-degree (or out-degree) of any node in the current iteration. Clearly, 
d 2n 4" 1. 

for each node r of layer i - 1 do /* at most 2n -1- 1 times */ 

for each node v at this layer i incident from r do /* at most d times */ 
for each node p of layer i + 1 incident from v do /* at most d times */ 
parbegin 

d(r,p) = maxw(d(r, v) + d{v,p)) 
parend: 


4.3.3. 1 Analysis of Algorithm 

Lemma 13 In iteration the layer adjacent to first layer (second layer) will contain 
all nodes which were present in layer 2*^“^ -f- 1 before first iteration. 

Proof: In every iteration we will be removing all even layers. In second iteration second 
layer is at distance 2 from the old first layer (before the first iteration) i.e. it will be the 
third layer. After k iterations the second layer will be the layer that was at a distance 
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2 from the old layer 1 (before the first iteration). Thus the lemma follows. I 

Lemma 14 At any point in time the degree of any node is less than or equal to the 
degree of source node of the graph. 

Proof: Since source node is an even node its behavior represents the behavior of all 
even nodes. We can observe from the construction of graph that every odd node will 
have only one edge leaving it, and that edge is to an even node. So, the degree of odd 
node in iteration will be equal to degree of an even node in the previous iteration. 
Thus the lemma follows. | 

Lemma 15 The maximum degree of any node after iteration will be min(2*^ + 
3, 2n + 1) 

Proof: By Lemma 14, it follows that it is enough if we prove that the degree of source 
as min(2* + 3, 2n + 1) in k^^ iteration. By Fact 1, we have the number of nodes in first 
iteration at layer i is at most 2i+ 1 and from Lemma 13 second layer in k^^ iteration will 
be layer 2*“^ + 1 in first iteration. Therefore, the number nodes in second layer after k 
iterations will be 2 * (2*"^ + 1) + 1 = 2*^ + 3. It is evident from description of our graph 
that every node is reachable from the source (this implies that every node in the second 
layer is connected to source) and the maximum number of nodes in any layer is at most 
2n + 1 so, the degree of a node in any iteration can not exceed 2n H- 1. Therefore, the 
degree of source node is min(2*' + 3,2n + 1) in k^ iteration. Thus, the result follows. 

I 

Lemma 16 For removing nodes at layer i in k^^ iteration, our algorithm on a CREW 
PUAM requires 0(log(mm(2n + 1,2^ + 3))) time using (2n+l)*(i^^^^^l^) = 
® processors. 

Proof: For removing nodes at layer i we need to find d{r,p) where, r € layer i — 1 
and p e layer i + 1. For finding d{r,p) we need to find minimum of d{r,v) + d{v,p) 
where, v is in layer i and r is incident on v, and v incident on p. If d is the degree of r 
then we need to find minimum of d numbers, this will require ^ processors and logd 
time on a CREW PRAM. Since there can be d nodes in layer i 4- 1 associated with v 
and at most (2n +• 1) nodes in layer i - 1 the number of processors for removing nodes 
at layer i is (2n 4- 1) * d * We have by Lemma 15 that the degree of any node in 
k^^ iteration is inin(2n 4- 1, 2* 4- 3). Therefore, we need (2n 4- 1) * ( lig + 3 ) ) 
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processors and ( > (l()g{min(2n +1,2*= + 3))) time for iteration. I 

Theorem 6 An optimal schedule for coalescing operations when number of jobs is two 
can be found in h.g- n}) time using processors on a CREW PRAM. 

Proof: The innial number of layers in the graph is 2n + l. In every iteration we will be 
removing half ot layers present. Therefore, there will be layers in iteration. 

Combining this v;ith Lemma 16, A:"* iteration takes 

processors and i ^ ^!ognnn(2/i + 1. 2* + 3)j time. These expressions will be maximum 
when 2*= + 3 > 2 » + I or when k > log2(n — 1). Therefore, the number of processors in 
any iteration is Since each iteration takes at most C>(log(2n + 1)) time and there 
are Iog{n +1)4 i iterations, the time required for our algorithm is 0(log" n). Thus the 
result follows. I 

It takes 0(lt g log n) time to find minimum among n numbers on a common-CRCW 
PRAM with ~ processors. An optimal schedule for coalescing operations when 
number of jobs is two can be solved in O(lognloglogn) time using processors 

on a common-CRCW PRAM. 
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Chapter 5 


Sub-Cubic Cost Algorithm for the 
All Pairs Longest Path Problem 
for Directed Acyclic Graph 

5.1 introduction 

The all pairs longest path (APLP) problem is to compute longest paths between all pairs 
of vertices in a d rected graph. The all pairs longest distance problem (APLD) is defined 
similarly, the word ’’paths'* is replaced by "distances". These problems are A'P-hard 
for general graphs. But for directed acyclic graphs, these can be solved in a manner 
analogous to the all pairs shortest path problem. In this chapter, we modify the all pairs 
shortest path algorithms given by Takaoka [40] to find all pairs longest path for directed 
acyclic graphs. 

In this chapter, we design a parallel algorithm for the APLD problem, for a directed 
acyclic graph with unit edge costs, with O(log^ri) time, (worst case) and 
processors in Section 5.4.1. Next we describe the parallel algorithm for the APLP prob- 
lem, with general edge costs in Section 5.4.2. In Section 5.2, we give basic definitions, 
in Section 5.3, we give an algorithm for longest distance matrix multiplication(LDMM). 
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5.2 Basic definitions 


A directed graph G{V, E). has vertices V = {0, 1, ... ,n - 1} and edges E is a subset 

of V X I . The edge cost (i,i) £ E \s denoted by dy-; the matrix D has {i,j) element 

as d,j. We assume that dy = — oo if there is no edge from i to j and da = 0. The cost, 
or distance, of a path is sum of costs of the edges in the path. The length of a path is 
the number of edges in the path. The longest distance from vertex i to vertex j is the 
maximum cost over all paths from i to j, denoted by d-,-. Let D* = {d*j.}. 

Let A and B are (n, n)-matrices. The three products are defined using the elements 
of A and B as follows: 

n-l 

Ordinary multiplication over a ring C = AB dj = Yj °^khj (29) 

it=o 

n-l 

Boolean matrix multiplication C — A- B dj = \/ (kk A bkj, (30) 

k=Q 

Longest distance matrix multiplicationC = i4 X B Cij = + 6ifej}(31) 

The best known algorithm [13] for ordinary matrix multiplication has time complexity 
0(n*^), O' = 2.376. We can also use that algorithm for Boolean matrix multiplication. 

If we have an algorithm for LDMM with Toin) time, we can solve APLD problem in 
0{To{n)iogn) time by repeated squaring. If we have an algorithm for LDMM with 
witnesses, (i.e. which represents the set of k which gives the maximum for dj) with 
Tp{n) time, we can solve APLP problem in 0{Tp{n)\ogn) time. 

5.3 Longest Distance matrix multiplication by divide-and-conquer 

Longest distance matrix multiplication operation can also be viewed as computing 
inner products defined, for a = (oi, . . . , o„) and 6 = (6i, . . . , 6„) by 

a X 6 = max {a*. + bk} 

Now we divide A, B and C into (m, m)-submatrices for N = n/m as follows. 


• • • ^IN 


Bn . . . B\n 


C\i . . . C\N 

; 

B = 

1 : 

c = 


_ Apti . . . Ann . 


_ Bjvi • • • Bnn . 


. Cnx • • . Onn 
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Then. C can be . omputed by 


^ = 1. • • ■ > ^) (32) 

where the produi t of submatrices is defined in a similar manner to Equation 31, and 
the max operr'ion is defined on matrices by taking the “max" operation component- 
wise. Since comparisons and additions of distances are performed in a pair, we can omit 
counting the number of additions for measurement of complexity. We have to find 
maxima of N nunibers(0(rr’.V) time), we have multiplications of distance matrices 
in Equation 32. Let us assume that each multiplication of (m, m) matrices can be done 
in r(m) computing time, assuming that a pre-computed table is available at no cost. 
The time for constructing the table is given by Lemma 19, and is reasonable when m is 
small. Then the total computing time is given by 

i ) ^n“.V -t- = 0 (n^fm + {n/m)^T{mfj (33) 

By Lemma 17 we have. T{m) = 0(m"“). Therefore, the time given in Equation 33 
becomes 0{n^/\ TTi). 


5.3.x Distance matrix multiplication by table-lookup 


We divide (m,r? imatrices into strips for 3/ = m/l where ^/m <l < m as follows: 


~ Al ■ ■ ■ I 


B = 




Bm 


where .4i is a {vi.l) matrix, and Bj is a (f,m) matrix. We regard later .4 and B as 
(m, m)-submatrices in Equation 32. Now the product C — Ax B is given by 


C = X J5*} (34) 

By Lemma 18 we can compute Ak x Bk in O(i^m) time, assuming that a pre- 
computed table is available. Then, the right-hand side of Equation 34 can be computed 
in time 0{Mrn- + = Oim^l + Irn^). Setting I to this time becomes 

0(m^-®). 
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Then, C can be computed by 


Cx] = m^^^{Aik X Bkj}{i,j = l,...,N) (32) 

where the product of submatrices is defined in a similar manner to Equation 31. and 
the "max" operation is defined on matrices by taking the "max” operation component- 
wise. Since comparisons and additions of distances are performed in a pair, we can omit 
counting the number of additions for measurement of complexity. We have to find 
maxima of N numbers{(7(n‘iV) time), we have multiplications of distance matrices 
in Equation 32. Let us assume that each multiplication of (m, m) matrices can be done 
in T{m) computing time, assuming that a pre-computed table is available at no cost. 
The time for constructing the table is given by Lemma 19, and is reasonable when m is 
small. Then the total computing time is given by 

0 (n-y + iV^T(m)) = 0 [n^/m -h (n/m)^T(m)) (33) 

By Lemma 17 we have, Tim) = Oim^-% Therefore, the time given in Equation 33 
becomes 0{n^f\Tn). 


5.3.1 Distance matrix multiplication by table-lookup 


We divide (m,n/ )-matrices into strips for M = m/l where y/m <l<m as follows; 





Bm 


where Ai is a (vi.l) matrix, and Bj is a {l,m) matrix. We regard later .4 and B as 
(m, m)-submatrices in Equation 32. Now the product C = Ax B \s given by 


C = max {.4fc x Bk} 1^"^) 

By Lemma 18 we can compute Ak x Bk in 0(l^m) time, assuming that a pre- 
computed table is available. Then, the right-hand side of Equation 34 can be computed 
in time + Ml) = Oimyi+lrn^). Setting I to this time become 

0(m^-®). 
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Lemma 17 The time taken for multiplying [m,m)-submatrices is 0{w?'^) 

Hereafter, we assume I = y/rn. Now we show that for (m, V) matrix A, and {l,Tn) matrix 
B, Ax B can be computed in 0{l^m) time. 

We assume that the list of numbers {(ai^ - a^), . . . , - a^,)}(l <r<s<l) 

and the list numbers {{bsi — Ori), . . . , {bsm — arni)}(l < r < s < i) are already sorted, 
for ail r and s such that 1 < r < s < I . Let Ers and Frs be the sorted lists, 
respectively. For each r and s, we merge list and Frs to form list Gn- This takes 
0{l^m) time. The time for sorting is given by Lemma 20. Let Hrs be the list of ranks of 
(Oir — ctij), i= 1, . . . , m in Grs, and let Lrs be the list of ranks of {bsj—brj)ji = 1, . . . , m 
in Grs- l-et i3rr5(f] and Lrs\j\ be the and components of Hrs and Lrs, respectively. 

Then we have 

~ Oij, G’rs[J-'rs[7]] ~ ^sj brj- 

It will take 0{l^Tn) time to make lists Hrs and Lrs, for all r and s. As observed by 
Fredman [16], we have air + brj > ais + bsj or equivalently air — o-u > bgj - brj. He 
observed that the information of ordering for ail i,j,r and s in the right-hand side of the 
above formula is sufficient to determine the product A x B by pre-computed decision 
tree. We observe that to compute all components of A x B it is enough to know the 
above ordering for all r and s, 

fljr ais ^ ^sj ^rj ^ •^ra[j] 

We use a short cut notation ai 2 • . • flz-u to express the sequence ai 2 • . . aiia 2 z . . . a 2 z . . . oz_iz. 
The list H[i] = Hi 2 [i ] . . .Hz_iz[i] is encoded into a single integer in lexicographic order 
[39], and ^(^[i]) is assumed to represent that integer. Similarly, h{L[j]) represents the 
positive integer for the list L[i] = Li 2 ^ ■ . ■ Lz-iz[*]- The time for this encoding for all 
H[i] and L\j] is Then the pre-computed table I gives the desired index for the 

inner product, that is, 

J[/i(fr[i]), h{L\j])] = fc ^ is the index for max{aijfc + bik} (36) 

After the encoding we can compute inner products in 0{rn^') time, assuming that 
the pre-computed table I is available. Hence Ak x Bk in Equation 34 is time. 

Lemma 18 The time taken for computing A* x Bk in Equation 34 is 0(Pm) 

e;o 



5.3.2 Compulation of table I 


Let Hfi] = 3rid L[t] = L^li] ■ . . Li-u[i] be any sequences of l{l - 1)/2 

integers, whose values are between zero and 2m - 1. Let i and j be positive integers 
representing H aiid L in lexicographic order, that is, h{H) = i and h{L) = j. Then I 
is defined by 




k, if there is k such that Hks > Lks for all s > /c and Hrk < Lrk for r < k, 

underined, otherwise. 


This table can be used for table 1 in Equation 36. The time for computing table I is 
given by O = 0 = 0 . for a constant c > 1 

Lemma 19 The time taken for construction of table is where c> I is a 

constant 


5.3.3 Determination of size of submatrices 

Let 771 — log Ti/( i' 'g c log log n). Then the time for making the table I of this size is easily 
shown to be 0( ). which can be absorbed in the main computing time. Substituting 
this value of m for we have the overall computing time for distance matrix 

multiplication 

0 (n^ (log log n/ log . 

The time for sorting to obtain the lists Ets and Fn in Section 5.3.1 is 0{m^®logm). 
This task of sorting, which is called pre-sort is done for all Aij and Bij in advance, taking 

0((n/mfm^-Mogm) = 0 (ii*(IognIoglogti)‘^=) 

time where m = O(logr./loglogr.)'/^ which is absorbed into the above main complex- 
ity. 

Lemma 20 The total time taken for sorting in Section 5.3.1 is 0 (n 2 (logiogTi)^/ 2 ) 


5.3.4 Parallelization 

We separate the computation in Equation 32 into two parts. 
1. Computation of maxima of N numbers. 
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2. Computation of products AaBkjiiJ, k = l,...,N). 

We can find maxima of N numbers in O(Iogn) time with ^ = n^^iogioen 

processors on a EREW PRAM [39], >Tiiogm log n 

In the computation of A,Bk in Equation 34. there are 0(1^) independent merging of 
lists of length m. Such computation is done for A: = 1, . . There are also M - 1 

"max" operations on (m, m)-matrices. It is obvious that there is a parallel algorithm for 
these operations with O (tfi) = cost, with O (g;) = O (^) processors, 

in O(mlogm) time, where I = y/E. The tasks of encoding [39] and table-lookup can 
be done within the above processors and time complexities. Since products can be 
computed in parallel, we have a parallel algorithm AikBkj 

P = 0 {{n/rn)^rn}'^ J logm) = 0 ^n®(loglogn)^/^/(logn)^/^^ processors, and T = 
O(mlogm) = O(Iogn) as m = © (log n/ log log n). 

The lists En and Fr* have to be broadcast to the computation of products of 
AikBkj- This is done in O(logn) time with 


0 (n^(loglogn)^/^/(logn)®/^^ 

processors since a datum can be broadcast to N location with 0{N/\ogN) processors 
and 0{logN) time. The task' of table construct and pre-sort described above in the 
section can be be done within the above complexities. The complexities of computing 
products AikBkj dominates those of maximum computation, and those pre-sort and 
table construction are much lower. 


5.4 All pairs longest paths 

In this section, we give a serial algorithm for APLP for directed acyclic graphs with unit 
edge costs. This algorithm is analogous to the one given by Alon et al. [3]. Let 
be the l~th approximate matrix for D* defined by = dij i^ dfj < I, and d® = — oo 
otherwise. Then we can compute by the following algorithm algorithm. Let A be 
the adjacency matrix. 


Algorithm 1: Longest distances by Boolean matrix multiplication 
1. A {oij} where Oij = 1 if dij = 1, and 0 otherwise; 
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2. B /. |i'oolean identity matrix } 

3. for I ■— 2 > ’ do begin 

4. li . IK {Boolean matrix multiplication } 

5. for ; : to n - 1 do for j := 0 to n - 1 do 

6. if h 1 then ;= / else dfj := -oo; 

7. if >i -X then := 

8. end 

Algorithm 2: Sc .mg APLD 
(Accelerat ng phase} 

L for I ;= 2 to to r do compute using Algorithm 1; 

{Cruising i;iuise} 

2. I := r; 

3 . for « := 1 to logj 3 n/r} do begin 

4. for i : - 1 ) to n — 1 do 

5. Sea'! the row of £)''' and find the smallest set of equal df^'s such that 

f/ ' •' ^ ^ corresponding indices j be Si, 

6. /, ••= [3/, ■ 

7. for t := II :o n - I do for j := 0 to n - 1 do begin 

8. if ' o then m,; := inaXfcgs.i^^fc +^fc]} ~ 

9. if{/i ^ then m,j = -00 

10. if(/' , = -x) then ;= df^ 

11. else niij 

12. end; 

13. I := ti 


14, end 

Algorithm ■> computes D<'>. from i = 2 to r in the accelerating phase spending 

0 (rn^) time, and computes for I — f, f|rl, Iris'll >■■■>” by repeated squar* g 

in the cruising phase, where n' is the smallest integer in this series of I such that i > n. 
The key observation in the cruising phase is that we only need to check S, at line 8, whose 
size is not large, that 2n//. Hence the computing time of one iteration beginning at line 
3 is 0(nV')- Tiierefore. the time of the cruising phase is given with A - rioga/a"/’-! 
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by 


o(i:'‘V((3/2)'r)^=0(nV>-)- 

Balancing the l.jo phases with rn^ = n^fr the time taken by Algorithm becomes 
O j with r = 0 

When we have a directed graph G whose edges costs are between zero and M where 
M is a positive integer, we can convert the graph G to G' = (V, E) by adding M 
auxiliary vertices . . . ,vm-i for v 6 V’. The edge set is also modified to E'. If 
c{v^w) = /, w ii connected from i’(_i in E' where wo = v. Obviously we can solve the 
problem for G bv applying Algorithm 2 to G', which takes 0 

5.4.1 Parallelization for graphs with unit costs 

We design a parallel algorithm on a EREW PRAM for a directed acyclic graph with 
unit edge costs. Let .4 be the adjacency matrix used in Algorithm 1. There is a path 
from i to j of length less than or equal to I if and only if A^{i,j) = 1, where is 
the /-th power uf .4 by Boolean matrix multiplication. By repeated squaring, we can 
get A^{1 == 1,2. 1. . . . ,n^) with flogn] Boolean matrix multiplications, where n' is the 
smallest in this series of I such that I > n. These matrices give a kind of approximate 
estimation on the path lengths. 

Algorithm 3; Longest distances up to 2^ 

{Approximation phase} 

1. .4^^) := .4;/ := 1; 

2. for 5 := 1 to 2^ do begin 

3. - .4^^^; {Boolean matrix multiplication} 

4. I := 2/ 

5. end; 

{Gap filling phase} 

6. := D. 

7. for s ;= 1 to do begin 

8. for I :=■ + 1 to 2* do begin 

9. := {Boolean matrix multiplication} 

10. for / := 1 to n - 1 do for j := 0 to n - 1 do 
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11. ii = 1 then bfj := I else bfj := — oo 

12. end; 

13. for i 1 to n — 1 do for j := 0 to n — 1 do 

14. d\j .= mBX2»-i<,i<2‘{bij} 

15. end. 

The comput .g time of this algorithm is 0(2^n“). If we let R = [log 2 ra], we can 
solve the APLD but then time becomes, which is not efficient. We substitute 

Algorithm 3 for Algorithm 1 in Algorithm 2, and call the resulting algorithm Algorithm 
2' and let 2^^ = ' however, we can solve the APLD problem in time. 

Which is on pai with Algorithm 2. We can perform 2*“^ multiplications in parallel at 
line 9. Also we can compute 2*“' matrices at lines 10 and 11 in parallel. At line 
14, we can find the maximum in 0{s) time with 0{2^/s) processors. 

In cruising piiase, we can find the maximum at line 8 in O(logn) time, with 0{n/{l\ogn)) 
processors. The test is absorbed in these complexities. The summary of complexities is 
as follows 

Accelerating phase T = 0(/?logn), P = {nP.2^) 

Cruising phase T = 0(log(n/2^). logn), P = 0(n®/(2^1ogn)) 

If we let 2“ = /. ‘ we have the overall complexity as follows: 

T = 0(log^n), P = 

If vire have a graph with edges costs up to M we can replace n by Mn in the above 
complexities. 

5.4.2 Parallelization for graphs with general costs 

If edges costs are non-negative real numbers, we can apply the techniques in the pre- 
vious sections. We use the longest distance matrix multiplication algorithm described 
in Section 5.4, and slightly modify the cruising phase [40]. In Algorithm 2, there is no 
difference between distances and lengths of paths since the edge costs are ones. In line 5 
of Algorithm 2, \/e choose set Si based on the distances df^{j = 0,. . . ,n - 1) satisfying 
fi/2] < d\j < I to guarantee the correct computation of distances on path lengths, 
not distances. If we keep track of path lengths, we can adopt Algorithm 2 here. The 
definition of 4 here is that it gives the cost of the longest path whose length is not 
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greater than 1. The algorithm follows with a new data structure, array such that 
q\' is the length of a path that gives (ff]. 

Algorithm 4 

{Accelerating phase} 


0, ifi = j 

1. I := 1; £1(1) ;= D; Q(i) := where qjf = j i, jfi ^ jand(i,j) € B 

[ — 00 , otherwise; 

2. for s := 1 to flog 2 r] do begin 

3. := £(1) X £)(i); (distance matrix multiplication} 

4 q (20 _ f + QkJ, if djp is updated by 4^ + dg 

^ \ Qjj\ otherwise; 

5. I := 21 

6. end; 

(Cruising phase} 

7. for s := 1 to flogs/j n/r] do begin 

8. for i := 0 to n - 1 do 

9. Scan the row of and find the smallest set of equal gjj^'s such that 

fi/2l < < I and let the set of corresponding indices j be Si', 

10. h [3//21; 

11. for i := 0 to n - 1 do for j := 0 to n - 1 do begin 

12. if Si 7 ^ (l> then begin 

13. mij ••= maxfcgs, (42 + 4? }; 

14. k := one that gives the above maximum and satisfies that 
42 + Qkj is maximum among such k; 

15. £:=4^+4j-: 

16. end else {5i = (f>}L := -oo; rriij = — oo 

17. = rriij; ql} =L 

18. if dfj = -oo then begin Sip := 4y ; QiP '•= £ end 

19. Z := Zi 

20. end 

21. end. 

We index time T and the number of processors P in the accelerating phase and 
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cruising phase by i and 2. Then we have 


7 = O(logrlogn), Pi = 0(n^(loglogTi)^^^/(logn)^/^. 

The computation of all S, can be done in O(logn) time with O(n^) processors. The 
dominant complexity in the cruising phase is at line 13. This part can be computed in 
0{log{n/r)) tin.e with 0((n/r)/log(n/r)) processors. Thus we have 

I , = 0 {lognlog(n/r)), Pj = 0 (n^(n/r)/log(n/r)). 

Letting r = {lo,u <»/ loglogn)®/^ yields 

7\ - f;(logrtloglogn), Pi = 0(n3(loglogn)^/V(logn)®/-). 

r, = O(log^n), P 2 = 0 (n^(loglogn)^/V(loS’^)"^^)- 
Thus the cost is given by 

Pjj’j I\T2 = 0(n®(loglogn)^^^/(logn)^^^) + 0(n^(loglogn)^^'/(loS^)^^') 

= O(n^(loglogn)^/V 0 og^)^^^) 

; = o{n^) 

The APLP problem can also solved with in the above stated complexities. The only 
thing we need is to keep track of witnesses at distance matrix multiplication, and the 
maximum operation at line 13. Since the operation broadcasting N items can be done 
in 0(1) time on a minimum-CRCW PRAM using 0{N) processors, and operation of 
finding minimum among N numbers can be done in 0(1) with 0{N) processsors on 
a minimum-CRCW PRAM or O(loglogn) time with 0 ( 15 ^) on a common-CRCW 
PRAM. The time will be O(logn) on a minimum-CRCW PRAM and Oflognloglogn) 
time on common-CRCW PRAM.The above algorithm will o{n^) cost on a minimum- 

CRCW' PRAM. 
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Chapter 6 


Scheduling Interval Ordered Tasks 
In Parallel 

6.1 Introduction 

The problem of cheduling unit execution time tasks on a number of processors under 
arbitrary constrri.nts has been studied extensively in past. When m, the number of 
processors is part of input, the problem is j\/'P-hard[42]. There are a number of efficient 
algorithms for t' case when m = 2 [11]. But, the problem problem is still open for the 
3-processor cas*. [5]. Moreover, polynomial time algorithms are there, when m is part 
of the input, anu the precedence constraints are trees [25] or interval orders [32]. The 
main algorithmic tool employed in obtaining polynomial time sequential algorithms for 
solving these problem is known as list scheduling. Briefly the method is as follows: 
Form a priority list of tasks and construct a schedule iteratively by choos- 
ing a maximal set of r < m. independent tasks (tasks with no precedence 
constraints with in them) of highest priority in each iteration. 

Sunder et ai have shown that list scheduling problem is “P-complete, and hence it is 
unlikely to be pvirallelizable. Helmbold and Mayr [22] showed that the construction of 
the list schedule (with m = 2, arbitrary execution times for tasks and empty precedence 
constraints) is T-complete, and hence unlikely to be parallelizable. However, parallel 
algorithms are known for the 2-processor case. Helmbold and Mayr [23] presented the 
first WC algorittim, and Vazirani and Vazirani [43] presented an TU/C algorithm for the 
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probiem. Sundt : .7. ai [38] have proposed the first JVC algorithm for scheduling the 
interval onh n ./,s/c.s on a priority-CRCW PRAM. Their algorithm runs in O(log^n) 
time with ■■.•//) cost. 

in this chap-.vr, we present an M'C algorithm for scheduling interval ordered tasks 
on 171 machines The sequential algorithm for this problem, proposed by Papadimitriou 
et al. is based n list scheduling method [32]. Our algorithm makes use of structural 
properties of int-.-rval orders, and some techniques developed by Bartusch et al. [5]. 
Our parallel alg .iithm for scheduling interval orders constructs the same schedule as 
produced by tht .equential list scheduling algorithm. 

6.2 Basic Definitions 

Let G = {\ . .1 !:>e a partial order (or equivalently a transitive acyclic directed graph) 
consisting of 11 ’| nodes. We refer to G as a precedence constraint graph and the 
elements of 1 as tasks. A node u is a successor of a node v, if there is a directed path 
from V to u in < i. The set of successors of v is denoted by Ng{v) (or simply N{v) if 
the context is ci nr), u is a maximal node if N{v) = (f>. 

A schediih >7 length t for C on m processors is an m x t matrix S, where the 
columns are indexed from 1 , . . . , i and the rows are indexed from 1-, . . . , m. Each task x 
is assigned to an unique entry (p(x). t{x)) in S such that (x, y) £ A implies t{x) < t{y). 
For any task .r > 1 ', the entry (p(x). t{x)) denotes that task x is scheduled on processor 
p{x) at time instant t(x). No two tasks are assigned to same entry in S. The length of 
S is denoted by .9||. An entry in 5 is also called a slot. A slot of S to which no task 
is assigned is said to be empty. Two schedules Si and S 2 for G are considered to be 
the same if for every task x in G, the column assigned to x in Si is same as the column 
assigned to .v in .S. This is because the processors are Identical; it is irrelevant which 
processor is actually assigned to the task. We denote the subschedule of S consisting of 

columns i, / -f 1 ,j{l <i<j<t)hy ^[i.ij. Let S'andS" be two schedules of size 

mxti and m x for two partial orders Gi and G2. The concatenation of S' and S", 
denoted by S' 0 S", is the schedule of size mx{ti+ ta) obtained by concatenating the 
two matrices S' and S". Given a precedence constraint graph G, a list L in G is said 
to be precede ill I preserving if for any vertices u and v \n G, u is successor of v implies 
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that V precedes < in L. 

Definition 21 ; -'21 /In interval order is partial order G = {V,A) where V is a set of 
intervals on the : eal line and (n, v) e .4 iffx eu,yev implies x <y. 

The left anci the right endpoint of an interval tt 6 K is denoted by Z(u) and r(u), 
respectively. W* say u precedes v if (w, u) € A. We assume that, without loss of 
generality, that >iii endpoints are distinct. For each v E.V , clearly neighbors of r in G, 
Nc{v) = {» 6 I i l{u) > r(t;)}. The interval order corresponding to a list L of intervals 
is denoted by ( ■ L). For any two list Li and L2, let Li • denote the concatenation 
of the two lists The cardinality of a list L is denoted by \L\. For 1 < i j < |L|. L{i) 
denotes the i"' -dement of L and Z(z, . . . ,j) denotes the sublist of L consisting of the 
elements I (/). /. 1 + l),...,L{j). 

Let L be a list of intervals and let u ^ Z be an arbitrary interval. Let S be the 
schedule constructed from G{L) with m processors. A column in S is incomplete if less 
than m tasks at scheduled in it. We say an incomplete column c in S' is feasible for u 
if the column < does not contain any interval that precedes u in any column d > c. An 
empty slot in an incomplete column which is feasible for v. is said to be available for u. 

6.3 Sequential Algorithm 

In this section, vie first describe the List scheduling algouthm proposed by Coffman [10]. 
Algorithm: List-Schedule(G, Z, m). 

Inputs: An arbitrary precedence constraint graph G = iV,A), a precedence-preserving 
list Z of the tasks V', and m the number of jx-ocessors. 

1. i = 1; 

2 . while Z 7 ■ j do 

(a) Initialize an empty set Z'. 

(b) Put the first node u of Z into Z'. Scan Z form left to right. When a node 
w is scanned, if w is independent with every node in L' (namely, for any 
u € l.\{v,w) i A, and {w,u) i A), then include w into Z'. Repeat this 
process until either L' contain m nodes or all nodes of Z are considered. 

(c) Schedule the nodes in L' in column t. 
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(d) 1. " — U , f t + 1. 

3. Output t schedule constructed. 

The list -( • ’Img prohlem^ox an arbitrary precedence graph G = (V. A), precedence- 
preserving list .nd the number of processors m. is to compute List-Schedule(G, L, m). 
Sunder ct til. [ ■ , j have shown this is 7^-complete under NC reduction. We now describe 
the sequential .. jorithm for the scheduling interval ordered tasks, proposed by Papadim- 
itriou et al [3^ They have described an implementation that runs in 0(|V| -f |A|) 
time. 

Algorithm: S(.Mnientiai-Schedule(I, m). 

Inputs: L. a lis: .f intervals representing the interval order G = {V,A), and the number 
of processors n 

1. Sort the . L of intervals by increasing order of right end points. 

2. Compute ' = List-Schedule{G(L),L, m). 

6.4 Parallel Algorithm 

In this section. ■« e present an MC algorithm for scheduling interval orders. Our algorithm 
is based on the \‘C algorithm proposed by Sunderei al [38]. They have reduced the 
problem of con*: meting the optimal schedule to the problem of computing the optimal 
schedule length .md gave an algorithm for computing the optimal schedule length, that 
runs in 0(l<)g / time with O(n^) processors on a priority-CRCW PRAM. In this section, 
we present an ■ iogn) time algorithm with o(n^) cost on a minimum-CRCW PRAM 
for the same prculem and use their parallel algorithm for constructing optimal schedule. 

6.4.1 Computing optimal schedule length 

Bartusch et al. ^5] defined a bounding function b on the node set V, b{r) is an upper 
bound on the c:iumn number by which the node v must be scheduled in any schedule 
of length t for i They have also shown that if there is a schedule of length at most 
t for G on m piocessors it can be constructed by forming a list L of the nodes sorted 
by nondecreash-.j order of the bounds and then computing the list schedule associated 
with L. The repiesentation theorem of Bartusch et al [5]: 
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XhfiOrsm 7 |b There exists 3 schedule for the intervsl order G on m processors with 
length t iff for c . ery i with l<i<t, | {u | b{u) < i}| < m * i I 

Lemma 21 (J I Let x and y be two nodes in an interval order G. 

(1) lfr{x) < / ' I then b{x) < b(y). 

(2) If b{x) < li I then r{x) < r{y). | 

The boundi:.^ function b cannot be computed in parallel. Sunder et al. later defined 

another bounding function b' such that Theorem 7 holds with respect to b' for interval 
ordered tasks and can be computed in parallel. The algorithm described by them is as 
follows [38]; 

Algorithm: BQund(G = (V^, Ec). rn, t). 

1. Define a directed acyclic graph D = {Vd,Ei:),w) with integral edge weights w as 


follows; 


(a) 

To 

I'c U {s} where s is a new sink node. 

(b) 

Ep 

EgU{{v,s)\v€\^}. 

(c) 

Defi 

ir:e u:{v,s) — 0 for all v € Vp. 

(d) 

For 

wery (y, u) € Ep such that s do compute 


■Vl 

fi) = {a; € Nc(i’} | r{x) < r(u)}, 


»•( 

r.a) = 


2. for ail c * Vp. ^'nd d{v), the length of the longest path in D from v to sink s. 

3. for every . € Vq. compute b'{v) = t-d{v). 

The comple'<ity of the above algorithm is dominated by Step 2 that involves the 
computation of all pairs longest distance, of a directed acyclic graph. For this step 
Sunderet al. hai.e proposed an algorithm that takes 0(log7i) time with log n) cost. 
But, this step can be done in 0(log n) time using 0(ra^(loglog (log n^l-) = o{n^) 
cost using algorithm presented in Chapter 5, on a minimum-CRCW PRAM. 

Lemma 22 [38) We can replace b{v) by b'{v) in Theorem 7 ■ 

The following algorithm [38] computes the length of a optimal schedule for an interval 

order G. 

Algorithm: Length(Z», m). 
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1. As in the algorithm Bound, construct the edge weighted graph D = {Vd,Ad,w) 

from C(L), and compute, for each node u of G{L), the longest distance d{u) in 

D. 

2 . In parallel, for every 1 t n, check if the following conditions are true: 
for every 1 < z < t, l{u | l/{u) = t - d(u) 

3 . Output the smallest £ such that the foregoing conditions are true. 

The complexity of the algorithm is dominated by Step 1 and it takes O(logn) time 
with o(n^) cost. The Steps 2 and 3 take at most O(logn) time with O(n^) processors. 

6.4.2 Paraiiei algorithm for constructing optimal schedule 

In this section, we describe a parallel algorithm [ 38 ] that constructs the list schedule for 
an interval order G = {V^A). Let L be the list of the tasks of V in increasing order of 
right end points, and SeqS — Sequential-Schedule (L,m) = List-Schedule(G(L),i,m). 
Their parallel algorithm constructs SeqS using divide and conquer technique. Suppose 
IjSeqrSjl = t and £i = [£/2j. Let S' = iS'e55[l,£i] and S" = SeqS[ti -I- 1,£], construct 
S' and 5 " in parallel. By Lemma 21, there exists an integer i < n such that all tasks 
in £/(l, . . .,i)(and possible some other tasks) are scheduled in the columns of S' and 
each each column of S' contains at least one task from L(l, . . . ,i) [ 38 ]. Sunder and He 
have described algorithm to find all tasks those can be scheduled in £i. The algorithm 
proposed in [ 38 ] for finding optimal schedule is as follows: 

Algorithm: ParaHel-Schedule(L, m). 

1. For i := 1 to n do 

Compute Length(I.(l, . . .,i),m). 

Let £=:Length(L(l,...,n),m) 

2 . if £ = 1, then schedule all tasks of L in one column and return else, let s be the 

integer such that Length(L(l, . .. ,s),m) < £/2 and Length(I<(l, . . . , > 

£/2 

3 . L2^L{s + l,...,n) 

4 . Let u be the last interval in L\. Define L3 = {a: € L2 j l{x) < r(w)}. {Lz is the 
set of intervals in L2 that might be scheduled with the intervals in Ly. These are 
pairwise incomparable} 
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5. A' - Juirp(Li. I.i.m) { Jump is a function defined later. It returns the subset X 
of I ; th.u might be scheduled with Ij. } 

6. L'=/.i XandL" = L 2 -A'. 

7. S' - Par.il!el-schedule(L',m) and S" = Parallel-schedule(L",m). 

8. Output "'oS" 

Let L\ -■ '^.(1, . . . . s),L 2 (s + L..,,n), and let v, = L{s). Let SeqSi = List- 
Schedule(L|. n ). Let ||Se?5i|| = t\. Then, let us define the following sets for some 
X € L-i that occurs at position k of L. 

A{x) = \ii B L{s + - 1) \ l{y) < l{x) <r{u) <r{y) <r{x)}. 

B{x) = {n = L{s + l,...,k-l)\l{x) <l{y) <r{u) <r{y) <r{x)}. 

C{x) = {u ~ L{s + I, .. .,k - 1) \ t{u) <l(y)Bndr{y) < r{x)}. 

A{x), B(x). and C(.v) are sorted by increasing order of right end points. 

Algorithm: Junnp(Li, i3,m) 

1. for every c € Lz do 

compute A{x) and B{x). 

2. for every r € I.3 do 

(a) .^i 1 . ) = the number of empty slots available for x in 5eg5i[c(.r). ti], 

(b) SA •") = the number of empty slots available for x in SeqSi{x)[c{x)Ji], { 
S(iiSi{x) - List-Schedule(Ii.A(x),m) } 

3. for every .c € I3, 

let B\x) = .. sorted by increasing order of left end points. 

4. return A' = {x | (3j : 1 < i < : Soix) — Si{b^) > j)or{S 2 {x) > Ai)}. 

We now describe, how Step 2 of Jump can be implemented [38]. For 1 < f < n, 
define the list 0(1,1) = where c,-(l < i < i) is a copy of the interval 

X. Because 6'i(x) is the number of empty slots in SegiS'i[c(x),ti], evaluating Si{x) is 
equivalent to finding the largest i such that C(x,i) can be scheduled in empty slots 
of SegSi{c(.r). ^]. Hence Length(Ii,C'(x,i)-m) > ti iff 5i(x) < i. Similarly because 
52(2:) is the number of empty slots in Se55i(x)[c(x),ti], evaluating i5'2(x) is equivalent 
to finding finding the largest i < n such that the intervals in C(x,i) can be scheduled 
in empty slots in SeqSi{x)[c(x),ti]. There are two cases. 

Case 1. Suppose |15e55i(x)l| > ii- Then, at least one node in A{x) is scheduled in 
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SeqSlix) in a column cf > ti. Because every node in A(x) must be scheduled before the 
node X, there are no empty slots available for re in SeqSi{x)[c{x), ti\. Hence S 2 {x) = 0. 

Case 2. Suppose !l<Se^S'i(x)|| = ti. We can evaluate S 2 {x) using the following 
condition: Length(Li.A(x).C(rc,i),m) > ti iff 52 (x) <i. 

Thus for each x € L 3 , iSi(a:) and S 2 {x) can be computed by either making 0{n) par- 
allel calls to procedure Length or O(logn) calls in sequence by doing a binary search [38], 
Sunder et al. have stated that above algorithm takes 0(log^ n) time with 0(n^ logn) or 
alternatively 0{!og^ n) time with O(n®logn) cost. We first perform a better analysis of 
the algorithm and reduce the cost by a factor of O(logn) in both the implementations. 
Then, we use a better algorithm for computing the length of the schedule as described 
in Section 6.4.1, and further reduce the cost by another factor of logn. 

Analysis of Algorithm. 

• Setting Li and La in Step 3 requires to n parallel calls to the algorithm Length (in 
Step 1 ) [38]. If C and T are the cost and time required for the algorithm Length. 
Then, the cost of Steps 1 to 3 is 0(nC) and time required is 0{T). 

• For setting up L' and L" we are calculating X in Step 5 i.e. calling algorithm 
Jump. In the algorithm Jump, Si{x) and 52 (x) are calculated for each x G L 3 by 
making O(iogn) sequential calls (by doing an exponential and binary search) to 
algorithm Length or 0(n) parallel calls [38]. Thus, we can observe that the cost 
and time required for the algorithm Jump is (nClogn) and O(Tlogn) or 0{v?C) 
and 0(T). 

• If we sum up all the costs and times in the first recursive step then, the time, 
and cost of the algorithm Parallel-Schedule are O(Tlogn) and O(nC'logn) or 
alternatively 0(T) and 0{n^C). 

• We use the all pairs longest path algorithm described in Chapter 5 in the al- 

gorithm Length. Then, the algorithm Length takes T = O(logn) time with 
C = 0(n®(loglogn)^/^/(logn)^/2) _ ^(^ 3 ) _ 3 rninimum- 

CRCW PRAM. 

• Thus the algorithm Parallel-Schedule takes (9(log^n) time with o(n^logn) cost 
which is also Q(n® logn) or O(logn) time with o(n®) cost, which is also logn)), 
in the first recursive step. 
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Theorein 8 The list scheduling problem for interval order can be solved in O(log^n) 
time with O(n‘‘(log}ogn)®-^{logn)'/^) cost or O(log^n) time with O 
cost on a minimum- CRCW PRAM. 

Proof: The correctness of proof follows from [38]. Step 1 ensures the recursion depth 
of Parallel-Schedule is O(logn) because each recursive call reduces the schedule length 
by a factor 2. The sizes of S' and S" are at most n/2. Let C{n) denotes the cost of 
the parallel algorithm, then we have the following recurrence relations. 

C(n) = 2C (l) +nlogn/(n), 

where f{n) is the cost of finding the optimal schedule length 
and f{n) = 0(n®(loglogn)^/^)/(logn)^/^) = f2(n^) 

= 22C(^) +2|Iog(f)/ (f) +nlogn/(n) 

< 2®C (^) + n logn/ (|) + n logn/(Ti) 

< 2^C (^) + 22 log (^) / (f )] + nlogn/ (f) + nlogn/(n) 

< 2^C (^) -I- nlogn/ (|^) + nlogn/ (|) + nlogn/ (n) 

< 2^C (^) + nlogn [/{n) + / (|) + . . . + / (#r)] 

Since /(n) = ^(n^). /(|) < ^/(n). Thus [/(n) + / (f ) + ... + / (#t)] = 
©(/(n)) (See Master's theorem [14]) 

Since there are O(logn) recursive calls, the total time taken by the algorithm is 
O(log^n) time with o(nMogn) (or 0(nHloglogn)3/2(logn)i/2)) cost. In a similar 
fashion, we show below that the algorithm Parallel-Schedule takes 0(log^ n) time with 


o(n») (or 0 cost. 

C{n) = 2C (f ) + n2/(n), 


< 

< 

< 

< 


where /(n) is the cost of finding the optimal schedule length 
and /(n) = C>(n3(loglogn)3/2)/(logn)^/2) = 
22(7(f)+22^/ (|)+n2/(n) 

2^0 (f ) + 22 [^f (f )] + if (t) + nV(n) 

2*C7 (f) + [’^^ + (^) + ■ . • + (#t)] 


< n2/(n) 
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Chapter 7 
Conclusions 


In this thesis we have proposed improved parallel and sequential algorithms for different 
optimization problems. For the problem of all pairs shortest routing considered, we 
have reduced the cost of existing parallel algorithm proposed by Gupta and Krishnamurti 
[21], from O(n^log^n) to O(n^logn). We have also improved the O(n^) time S-T 
routing sequential algorithm proposed by them to 0(7n + nlogn) time. For the single 
vehicle routing problem with deadline time constraint on locations that are on a line, 
and for the single vehicle routing problem with release time constraint on locations 
that are on a line, we have improved the cost of existing algorithms given by Gupta and 
Krishnamurti [21], from 0(n® log^ n) to Oin'^ logn). For the TRP-line problem, we have 
proposed a new sequential algorithm that takes O(n^) time and a parallel algorithm that 
takes O(log^n) time, with 0(n^ log7^) cost. We have also proposed 0(n) time optimal 
parallel algorithms for dSVRPTW-line, rSVRPTW-line and TRP-line problems. 

To the best of our knowledge, we do not know any parallel algorithm for the coa- 
lescing operations with precedence constraints in real-time systems. However, there are 
sequential algorithms for this problem [30, 9]. We have proposed an O(log^n) time 
parallel algorithm, with O(n^logn) cost for this problem. Moreover, we have presented 
an 0(n) time optimal parallel algorithm for this problem. 

We have proposed sub-cubic cost algorithms for the all pairs longest path problem 
in directed acyclic graph, by modifying the all pairs shortest path algorithms proposed 
by Takaoka[40]. For the problem of scheduling interval ordered tasks, we have improved 
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the cost of e;-,. sting parallel algorithms proposed by Sunder et al. [ 38 ] by a (^(log^n) 
factor. 

In this thesis we have considered only single vehicle routing problems. It will be 
interesting to see multiple vehicle routing for VRPTW-line and TRP-line problems and for 
architectures :;;fferent from line. For scheduling interval ordered tasks we have described 
an jVC algorit:,m. It will also be interesting to see if randomization can be used to get 
better algorithms. 
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